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(54) Title: METHOD FOR UTILIZING THE 5 'END OF mRNA FOR CLONING AND ANALYSIS 

(57) Abstract: A method is disclosed for obtaining the 5'ends of transcribed regions from a plurality of nucleic acid fragments 
obtained from biological materials or synthetic pools. DNA fragments encoding the 5'ends are enriched for their individual analysis 
or for the analysis of concatemers thereof. The sequence information derived from 5' ends can be used for characterization and 
cloning of the transcriptome. 



WO 03/106672 



PCT/JP03/07514 



DESCRIPTION 

Method for utilizing the 5'end of mRNA for cloning and analysis 
5 Technical Field 

The present invention relates to a method for selectively collecting multiple nucleic acid 
fragments containing information on the nucleotide sequences at the 5' end of multiple 
mRNAs in a sample. 

10 

Background Art 

In order to utilize genomic information, parts of the genome are transcribed into mRNA. For 
the understanding of the genome and its use in regulatory processes, information on 
15 individual mRNA species is required. Such information should include partial or full-length 
nucleotide sequences and their relative or absolute quantities in a given biological context. 

Conventionally, the base sequences of mRNAs contained in a cell, tissue or organism have 
been analyzed by preparing a cDNA library through reverse transcription. The mRNAs are 
20 used as templates and individual cDNA fragments in said cDNA library are investigated. 
Since a sample contains a large number of various mRNAs, the conventional method is of 
limited efficiency in analyzing gene expression profiles and identifying rare genes. Therefore, 
other technologies have been developed to monitor the expression patterns of mRNA in 
complex samples and identify genes by short sequence elements called tags. 

25 

High-throughput expression profiling is commonly performed using so-called DNA 
microarrays (Jordan B., DNAMicroarrays: Gene Expression Applications, Springer-Verlag, 
Berlin Heidelberg New York, 2001; and Schena A, DNA Microarrays, A Practical Approach, 
Oxford University Press, Oxford 1999). For such experiments, specific probes representing 
30 individual genes or transcripts are placed on a support and simultaneously hybridized with a 
plurality of samples. Positive signals will be obtained if a probe on the support reacts with a 
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molecule presented with the sample. These experiments allow the parallel analysis of a large 
number of genes or transcripts. However, the approach is limited in that only genes or 
transcripts which have initially been identified by other experimental means can be studies. 
Such means can include cDNA libraries, partial sequence tags and/or results obtained from 
5 computer predictions. Due to this limitation of DNA microarray experiments, alternative 
approaches based on partial sequences or tags obtained from a plurality of mRNA samples 
are in use for gene discovery and expression profiling. 

The so-called SAGE (Serial Analysis of Gene Expression) method is known as an efficient 
10 method of obtaining partial information on the base sequences in mRNAs (Velculescu V.E. 
et at., Science 270, 484-487 (1995)). According to this method, DNA concatemers are 
formed by ligating multiple short DNA fragments (initially about 10 bp) containing 
information on the base sequences at the 3 ' end of multiple mRNAs, and the base sequences 
in these DNA concatemers are determined. This is a method for obtaining partial information 
15 on the base sequences at the 3 ' end of multiple mRNAs. When only a short base sequence 

close to the 3 5 end is available but the mRNAs itself is already known, the SAGE method can 
often identify a specific mRNA or gene, although the available base sequence is often as 
short as about 10 bp. Recently, an improved version of SAGE, the so-called LongSAGE, has 
been published. This method allows for the cloning of longer SAGE tags (Saha S. et al., Nat. 
20 Biotechnol. 20, 508-12 (2002), US patent publication Nos. 20030008290 and 20030049653). 
The SAGE method is currently in wide use as an important method for analyzing genes 
expressed in specific cells, tissues or organisms, and SAGE tags are available for reference in 
the public domain, e.g. under http://cgap.nci.nih.gov/SAGE. 

25 While the SAGE method can be used to learn a partial base sequence at the 3 ' end of mRNAs, 
it is difficult to clone new genes based on the information in such short sequences at the 3 ' 
end only. Despite its multiple applications, SAGE does not teach how to obtain cDNA clones 
close to the 5' end of mRNAs. In fact, 4 bp restriction enzymes of Class IIS are used. A 4bp 
cutter usually cleaves on average a few hundred nucleotides, which is on average one tenth of 

30 the average size of an mRNA transcript. Thus SAGE principles strongly suggest that 3 9 ends 
are collected with high prevalence, and no information can be collected about the 5' end for 
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most of the transcripts. In addition, the initial version of SAGE was limited due to the short 
length of the tags, in most cases only tags of 10 bp lengths were used, and a reliable analysis 
and annotation of the information were not possible. 

5 Although techniques exsit for the collection of full-length cDNA clones and sequences 
derived thereof, those are focusing on collecting the full-length cDNA clones and not 
fragments covering the 5' ends only. Full-length cDNA cloning approaches are therefore not 
suitable for high throughput identification and analysis of start sites of transcription and the 
related promoter regions. 

10 

Summary of the Invetion 

Accordingly, it is an object of the present invention to provide a new general method that 
enables the acquisition of information on the base sequences at 5' ends of mRNAs in a 
15 sample. It is another object of the present invention to make it possible to clone new genes 
and analize genomic sequence information which relates to coding and regulatory regions. 
The information may include statistics on the transcriptional start sites derived from large 
numbers of 5' end sequences. 

20 Thus, the present invention refers generally to the concept of isolating portions of nucleic 
acids corresponding to the 5'end of transcribed genes and using them to further high- 
throughput analysis such as sequencing. The present invention offers a novel way to combine 
contrasting teachings and provide a new, high throughput approach to 5' ends which is useful 
for promoter mapping and analysis. The method of the present invention is effective for 

25 analyzing the mRNAs contained in the sample for discovering and cloning of new genes and 
studying gene regulation. The use of the present invention to study and analyze complex 
regulatory networks in combination with the ability to identify and clone new genes opens a 
wide area of applications for monitoring biological systems and their status in development, 
homeostasis, disease, and beyond. 
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The present invention provides a new method for promoter analysis using 5' ends, while 
SAGE does not allow any promoter analysis due to the use of unrelated 3 ' ends. 

After devoted research, the present inventors have completed the present invention by 
5 arriving at the fact that by selectively collecting multiple nucleic acid fragments containing 
information on the base sequences at the 5' end of the mRNAs, it is not only possible to 
acquire information on the base sequences in mRNAs, but it is also possible to clone new 
genes; and they also have found a concrete method for attaining this goal. 

10 That is, the present invention provides a method for preparing concatemers of a plurality of 
nucleic acid fragments related to nucleotide sequences of 5' end regions of a plurality of 
mRNAs in a sample, comprising: a first step of selectively collecting a plurality of first- 
strand cDNAs which contain sequences complementary to 5' end regions of mRNAs from 
cDNAs that have been formed using mRNAs present in the sample as templates; a second 

15 step of obtaining frangments of the first-strand cDNAs collected in the first step; a third step 
of selectively collecting fragments which contain at least sequences complementary to the 5' 
end regions of said mRNAs; and a fourth step of ligating the collected fragments individually 
or in the form of a concatemer. 

4 

20 The present invention further provides a method for preparing concatemers of a plurality of 
nucleic acid fragments related to nucleotide sequences of 5 1 end regions of a plurality of 
mRNAs in a sample, comprising: a first step of obtaining frangments of full-length cDNAs; a 
second step of selectively collecting fragments which contain at least sequences 
complementary to the 5' end regions of said mRNAs; and a third step of ligating the collected 

25 fragments to form a concatemer. The present invention still further allows for the 

fractionation or isolation of the 5' end sequences before cloning and sequencing. In such 
cases first-strand cDNAs can be separated by subtractive hybridizations using drivers holding 
pluralities of nucleic acids of biological or artificial content. The present invention may be 
used for the identification of differentially expressed genes. 
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The present invention also provides a method for determining nucleotide sequences of 5' end 
regions of a plurality of mRNAs by sequencing concatemers prepared by the method 
according to the present invention. By using concatemers to obtain information on a large 
number of 5' end sequence tags as presented in the invention, it is possible to effectively map 
5 transcriptional start sites and the related promoter sequences. 

The present invention still further provides concatemers prepared by the method according to 
the present invention. The present invention still further provides a vector comprising said 
concatemer according the present invention. The present invention still further provides 
10 sequence tags derived from said concatemers prepared according to the present invention. 
The present invention still further provides means to use the sequences derived from said 
concatemers to analyze the content of the plurality of a RNA sample. The present invention 
still further provides means to use the sequences derived from said concatemers to identify 
regions in the genome, which are required for gene regulation and gene expression. 

15 

The invention is not limited to the use of concatemers for sequencing of 5' ends, and 
modifications at particular steps for the enrichment of 5' ends and their cloning as disclosed 
here allow for the individual sequencing of specific 5' ends. Such embodiments of the 
invention would include a modification of the first and second steps, in which a linker that is 
20 specifically bound to a solid matrix is used. The cDNA bound to the support would then be 
used to prepare the sequencing reactions. 

Brief Description of the Drawings 

25 Fig. 1 shows expamplary principle workflows according to the present invention, following 
procedures described in the examples. 

Fig. 2 shows an example of principle workflow of the invention given for the cloning of 5 ' 
end specific tags into concatemers. 
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Fig. 3 shows a principle workflow according to the present invention to illustrate an 
alternative approach for the direct sequencing of 5' end tags. 

Fig. 4 shows examples for the ligation of the first linker for the cloning of 5 ? end specific tags 
5 are presented. The examples specify the linkers used according to the protocols described in 
Examples 1 to 3 . 

Fig. 5 shows examples for the ligation of the second linker for the cloning of 5' end specific 
tags are presented. The examples specify the linkers used according to the protocols 
10 described in Examples 1 to 3. 

Fig. 6 shows examples for illustrating the structure of a dimer of 5' end tags prepared in 
accordance with Examples 1 to 3. Note that in the case of concatemers prepared according to 
Example 1 different linker sites can be found as XmaJI and Xbal create the same overhangs 
15 after digestion, which can be recombined. One example for such a concatemer is given in the 
figure. 

Detailed Description of Preferred Embodiments 

20 As described above, the method of the present invention can comprise, but is not limited to, 
roughly three steps each of which further comprises a plurality of steps. Each step will now 
be explained below. The concrete working examples of each step is described in detail in the 
later-mentioned working examples. 

25 STEP1 

Step 1 is to selectively collect cDNAs containing a site corresponding to the 5' end of 
mRNAs in a sample. The cDNAs may be synthesized for instance by using said mRNAs as 
templates. 

30 
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Either total RNA or mRNA taken from a desired cell, tissue, or organism can be used as the 
starting substrate. Methods for preparation of total RNA and mRNA are already known, and 
it is also described in the later-mentioned working examples* Alternatively, a cDNA library 

K 

itself may be cleaved if it carries a recognition side for a Class IIS or Class III enzyme in 
5 proximity of the 5' end of its inserts. 

Also, a full-length cDNA library may be used to isolate the 5 5 end nucleic acids 
corresponding to the 5' end of the transcribed part of a gene. 

10 Step 1 itself can be conducted by a publicly known method. In other words, methods to 

construct full-length cDNAs and methods to synthesize cDNA fragments at least containing a 
site corresponding to the 5' end of the mRNAs are already known, and any of these methods 
can be adopted. One of the preferable methods is the cap trapper method (e.g. Piero Carninci 
et al., Methods in Enzymology, Vol. 303, pp. 19-44, 1999). This cap trapper method shall be 

15 explained below; however, the invention is not limited to the use of the cap trapper method 
and other approaches to enrich or select full-length cDNAs could be applied as well. 

The cap trapper method first synthesizes the first-strand cDNA with a reverse transcriptase 
using RNA as a template. This can be conducted by a known method. The cDNA can be 

20 primed with an oligo-dT primer or, when the template RNA is mRNA, it can be primed with 
a random primer. It is advisable to add trehalose to the reactive solution because it raises the 
efficiency of reverse transcription reaction by stabilizing the reverse transcriptase (US patent 
No. 6,013,488). It is preferable to use 5-methyl-dCTP instead of standard dCTP, because it 
avoids internal cDNA cleavage with several restriction enzymes and prevents unintended 

25 cleavage with restriction enzymes to a considerable extent. In addition, after the first-strand 
cDNA synthesis, proteins and digested peptides might be removed by CTAB (cetyl trimethyl 
ammonium bromide) treatment, or other more general methods to purify cDNA. 

Next, a selective binding substance is bound to the cap structure of mRNA. A "selective 
30 binding substance" here means a substance that selectively binds to a specific substance. 

Such selective binding substance includes preferably biotin, but is not limited to biotin. The 
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cap structure is the structure at the 5 9 end of mRNA, but not found in transfer RNA (tRNA) 
or ribosomal RNA (rRNA), thus allowing for a specific selection of mRNA molecules. 
Therefore, even if total RNA was used as the starting substrate, the selective binding 
substance only binds to mRNA. In addition, the selective binding substance does not bind to 
5 mRNA if the cap structure at the 5' end has been lost. Biotin can be bound to the cap 
structure by a known method. For instance, the cap structure can be biotinylated by first 
oxidizing the diol group within the cap structure by treating mRNA with an oxidizer such as 
NaI0 4 and making them react with biotin hydrazide. 

10 Single-strand RNA is cleaved by means such as RNase I treatment. Any other RNase that can 
cleave single strand RNAs but not cDNA/RNA hybrids or cocktails of RNAses that can 
cleave various single-strand RNA sequences with various specificities can be used 
alternatively. In an RNA/cDNA hybrid whose first-strand cDNA has not been extended to 
the site corresponding to the 5' end of RNA, the vicinity of the 5 ' end of RNA is single- 

15 stranded due to its failure to be hybridized with cDNA. Thus, the hybrid is cleaved at the 
single-stranded part and loses its cap structure through this step. Consequently, this step 
leaves only those mRNA/cDNA hybrids with cDNA that fully extends to the 5' end of 
mRNA to maintain the cap structure. 

20 A matching selective binding substance fixed to a support, which selectively binds to the 
aforementioned selective binding substance, is prepared. In the present specification, a 
"matching selective binding substance" means a substance that selectively binds to the 
aforementioned selective binding substance, which, in the case where the selective binding 
substance is biotin, would be avidin, streptavidin or a derivative thereof that binds 

25 specifically to biotin or its derivatives. The support can favorably be, but is not limited to be, 
magnetic beads, particularly magnetic porous glass beads. Since magnetic porous glass beads 
to which streptavidin has been fixed are commercially available, such commercial 
streptavidin coated magnetic porous glass beads can be used. Similarly other materials such 
as latex beads, latex magnetic beads, agarose beads, polystyrene beads, sepharose beads or 

30 alike could be used instead of porous glass beads. Furthermore, the invention is not limited to 
the use the biotion-avidin system but other binding substances could be used like a 
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digoxygenin tag that would be attached to the cap structure and digoxygenin recognizing 
antibodies attached to a solid matrix. 

Following this, the aforementioned mRNA/cDNA hybrid with the cap structure is made to 
5 react with the aforementioned matching selective binding substance fixed to the support in 
order to bind the selective binding substance on the cap structure with the matching selective 
binding substance on the support, thereby immobilizing the mRNA/cDNA hybrid with the 
cap structure on the support. When magnetic beads are used as the support, applying a 
magnetic force can quickly collect the magnetic beads. Meanwhile, in order to prevent non- 
10 specific binding to the support, it is preferable to treat the support with a large excess of 

DNA-free tRNA for blocking such binding before conducting this reaction. Other substances 
that are suitable for blocking the surface are nucleic acids or derivatives, for instance total 
RNAor oligonucleotides; proteins, for instance bovine serum albumine; polysaccharides, for 
instance glycogen, dextran sulphate, heparin or other polysaccharides. Hybrid molecules 
15 containing parts of all of the above could be used to mask non-specific binding sites. 

The above focuses on the case where Step 1 is conducted by the cap trapper method, but 
other methods can also be used as long as they can selectively collect cDNAs containing a 
site complementary to the 5' end of mRNA. 

20 

Alternatively to the cap-selection, one could dephosphorylate the 5' ends of mRNAs with a 
phosphatase, such as BAP (bacterial alkaline phosphatase), followed by treatment with the 
decapping enzyme TAP (tobacco acid pyrophosphatase). Subsequently a ribonucleotide or a 
deoxyribonucleotide can be attached to the 5 5 end of the mRNA instead of the original cap- 
25 structure with RNA ligase (Maruyama K, Sugano S Gene 138, 171-4 (1994)). In this way, for 
instance a Class II or Class III recognition site can be placed in the oligonucleotide or 
ribonucleotide sequence used during the ligation step, which is placed at the 5' end of a 
cDNA or RNA. This Class II or Class III restriction enzyme can then be used to cleave 
within the cDNA and produce the 5' end tag. 
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Alternatively to biotin, a cap-binding protein (Pelletier et al. Mol Cell Biol 1995 15:3363-71; 
Edery L et al., Mol Cell Biol 1995 Jun; 15(6):3363-71) or an antibody (Theissen H et al. 
EMBO J. 1986 Dec 1; 5(12):3209-17) that specifically binds to the cap structure can be used 
as the aforementioned selectively binding substance. 

5 

Alternatively, one could use methods to attach oligonucleotides chemically to the cap 
structure as described by Genset. This method is based on the oxidation of cap structure (US 
patent No. 6,022,715). This allows (1) adding to the cap an oligonucleotide which may 
contain a recognition side for a Class IIS or Class III restriction enzyme, and (2) preparing 
10 first-strand cDNA which then switches second-strand cDNA synthesis. 

Alternatively, one could use the cap-switch method as described by Clontech (US patent No. 
5,962,272). One could prepare the first-strand cDNA in presence of a cap-switch 
oligonucleotide which carries a recognition site for a substance capable of recognizing 
15 nucleic acids and cleaving them apart from the recognition sequence, so that Class IIS or 
Class III restriction enzyme may be used. The cap switch mechanism lets the first strand 
synthesis continue on the cap-switch oligonucleotides. This can be continued by a second- 
strand cDNA synthesis, or followed by a PGR step as describes for instance in the 
SMART T M Clontech cloning system. 

20 

In another embodiment, depending on the quality of RNA, random priming and extending the 
cDNA up to the cap-structure may allow for the utilization of 5' ends. Particular enzyme and 
reaction conditions allow sometimes reaching the cap-site with high efficiency (Carninci et al, 
Bio techniques, 2002). Even without a cap-selection it is possible to attach, in place of the cap 
25 structure, oligonucleotides which carry Class IIS or Class III restriction enzyme sites that 
would be later used to produce concatemers. 

Finally, the cDNA can be cleaved with the Class II (Class IIS or Class IIG) or Class III 
restriction enzyme to produce 5' end tags. The 5' end tags are used in the subsequent 
30 formation of concatemers. Any other methods, including mechanical cleavage, may possibly 
be used. 
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Fig. 1 summarizes expamplary workflows according to the present invention. 

According to Fig. 1, to perform the method of the present invention, 5' ends of transcribed 
regions can be isolated from a plurality of RNA molecules or total RNAs, a plurality of RNA 
molecules which have been enriched for mRNA fractions, or a full-length cDNA library. 

When applying the present method to a plurality of total RNA or mRNA molecules, mRNA 
molecules may be used as templates to synthesize complementary cDNA strands. The cDNA 
strands preceed to a selection step so as to enrich mRNA/cDNA hydrides comprising the 5' 
ends of the transcribed regions. After the removal or destruction of the mRNA portion by 
hydrolysis with an alkali, a first-strand cDNA pool comprising the 5' ends of the transcribed 
regions is prepared. 

In a different embodiment of the invention, a full-length cDNA library can be used to prepare 
a RNA pool comprising the 5' ends of the cDNA clones. A single-stranded cDNApool is 
then synthesized using the aforementioned RNA pool as a template. A first-strand cDNA 
portion thereof is obtained after the removal or destruction of the RNA molecules by 
hydrolysis with an alkali, and the resulting first-strand cDNApool comprises the 5' ends of 
the transcribed regions. The transcribed regions are available for further processing under 
the present invention. Note that when starting from a full-length cDNA library no selection 
for 5' ends is required. 

STEP 2 

In continuation of Step 1, the following Step 2 is carried out to selectively collect fragments 
containing a cDNA site that at least contains a site complementary to the 5' end of mRNA 

When using the aforementioned cap trapper method, the first-strand cDNA that has been 
immobilized on the support is released. It can be conducted by treating the support with 
alkali, such as sodium hydroxide. Alternatively to alkali, an enzymatic reaction with RNaseH 
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(which cleaves only the RNA hybridized to DNA) could be used. The alkali treatment 
releases the cDNA from the mRNA/cDNA hybrid, bound to the support through the cap on 
the mRNA and separates the cDNA from the mRNA to only leave first-strand cDNA on its 
own. 

Then, a linker is added to the cDNA that holds a sequence recognized in a sequence-specific 
manner by a substance having an enzymatic activity that cleaves the recognized DNA outside 
the recognition sequence. Such substances include but are not limited to certain Class II and 
Class III restriction enzymes. 

In this embodiment, a linker that at least carries a Class IIS or Class III restriction enzyme 
site and a random oligomer part at the 3' end are ligated to the end of this first-strand cDNA 
which corresponds to the 5' end of the aforementioned mRNA (i.e., the 3' end of the cDNA). 
For the later cloning of the 5' end sequence tags into concatemers, it is preferable, but not 
essential, to introduce a second recognition site into the linker. The second recognition site 
should be distinct from the aforementioned recognition site used for, for example, the Class 
IIS or Class III restriction enzyme. 

This can preferably be conducted using a linker that carries a Class IIS or Class III restriction 
enzyme site and a random oligomer part (SSLLM (single strand linker ligation method), Y. 
Shibata et al., BioTechniques, Vol. 30, No. 6, pp. 1250-1254, (2001)). The Class IIS and 
Class III restriction enzymes are restriction enzyme groups that cause cleavage at parts other 
than the recognition site. An example for a Class IIS restriction enzyme includes, but is not 
limited to, the use of Gsul. Gsul treatment cleaves one of the strands at 16 bp downstream 
from the recognition site, and the other strand at 14 bp downstream from the recognition site. 
Another suitable example is Mmel, which cleaves respectively 20 and 18 bases apart from its 
recognition sequence. An example for a Class III restriction enzyme includes, but is not 
limited to, EcoP15I, which cleaves respectively 25 and 27 bp apart from its recognition site. 
The random oligomer part is located at the 3 ' end of the linker, and though the number of 
bases is not particularly restricted, the recommended number is 5 to 9, or more preferably, 5 
to 6. The Class IIS or Class III restriction enzyme site should be located close to the 
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aforementioned random oligomer part, so that the cleavage point comes within the cDNA. 
The linker should preferably be a linker of double-stranded DNA of which the 
aforementioned random oligomer part protrudes to the 3 5 end and provides the binding end. 
In addition, it is advisable to bind a selective binding substance such as biotin to the linker in 
5 advance to facilitate its collection later. 

When the aforementioned first-strand cDNA is made to react with such a linker, the random 
oligomer part of the linker hybridizes with the 3 ' end of the first-strand cDNA (i.e. the 5' end 
of the template mRNA). Next, the second-strand cDNA is synthesized by using this linker as 

10 a primer and the first-strand cDNA as a template. This step can be conducted by a standard 
method. In a different embodiment of the invention, the first-strand cDNA can be subtracted 
by hybridization against a plurality nucleic acids followed by physical separation of single- 
stranded and double-stranded DNA-DNA or DNA-RNA hybrids. Such a subtraction step can 
be performed by, but is not limited to, the method disclosed in US patent publication No. 

15 20020106666. Single-stranded cDNA retrieved from the subtraction step is used as a 

template for second strand synthesis by standard procedures similar to the aforementioned 
approach omitting a subtraction step. 

Then, the obtained double-strand cDNA is treated with the above Class IIS or Class III 
20 restriction enzyme. In this step, a double-strand cDNA fragment comprising a linker-derived 
part and a part derived from the 5' end of the cDNA (the 5' end of the second-strand cDNA) 
is prepared. For instance, if Gsul is to be used as the Class IIS restriction enzyme and if a 
linker is designed to locate the restriction site immediately upstream from the aforementioned 
random oligomer site, the obtained DNA fragment would include a site derived from the site 
25 on the 5' end of the second-strand DNA (i.e. the site on the 5 5 end of the mRNA) of the 

length of 16 bp (however, the complementary strand is 14 bp). In the case of using Mme I, 
the length of the second-strand DNA fragment should increase to 20 and 18 bp, respectively, 
and in the case of EcoP15I, to 25 and 27 bp, respectively. 

30 Next, such DNA fragments are selectively collected. If a selective binding substance (e.g. 
biotin) had been bound to the linker as above, the collection could be conducted similarly to 
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Step 1 by using a support to which a matching selective binding substance (e.g. streptavidin) 
would be fixed. This procedure completes Step 2, which selectively collects fragments 
containing a cDNA site, belonging to the first-strand cDNA, which at least contains a site 
complementary to the 5' end of the aforementioned mRNA. 

The above explains the case where the SSLLM is used for Step 2, but Step 2 can also be 
carried out by any other method as long as the method can selectively collect fragments 
containing the 3 ' end of the first-strand cDNA (the 5' end of the template mRNA). For 
instance, it is possible to use exonuclease that cleaves the nucleotide in the 5 'to 3' direction 
at a controlled speed. The exonuclease treatment of the first-strand cDNA for a prescribed 
time period leaves a single-strand fragment comprising the 3 ' end of the first-strand cDNA 
(the 5' end of the template mRNA). It is possible to obtain only the targeted single-strand 
fragments by conducting treatment with a nuclease that only splits double-strand fragments. 
These fragments can be collected, joined with adapters and cloned. 

The above selected fragments that correspond to the 5 ' end can be further ligated to linkers 
and then used for PCR amplification in case that the quantity is insufficient for the 
downstream applications such as cloning. 

In one embodiment, the fragments corresponding to the 5' part of mRNAs is ligated on the 3' 
end to a linker carrying just another restriction enzyme site, which may be distinct from the 
restrictions site used in the first linker. Thereafter, the fragments corresponding to the 5' end 
of mRNA contain linkers carrying recognition sites for restriction enzymes at both sides. 
Such fragments can be amplified by PCR followed by subsequent cleavage by one or two 
restriction enzymes to produce DNA fragments suitable for the cloning of concatemers as 
described below in more detail. 

In another embodiment similar to (Velculescu et al, 1995), the aforementioned DNA 
fragment or PCR product is initially used for forming dimmeric molecules comprised of two 
5' end specific fragments ligated to one another in opposite orientation. These dimmers can 
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then be used directly or after just another PCR amplification to produce concatemers as 
specified in more detail below. 

In just another embodiment of the invention, alternatively to PCR amplification DNARNA 
polymerase could linearly amplify fragments corresponding to 5 ? ends having appropriate 
linkers at both ends. DNA fragments are then reconstituted by a reverse transcription step and 
a second strand formation to allow for concatemer formation. 

STEP 3 

The subsequent Step 3 forms concatemers by mutually ligating the collected fragments. Since 
there are multiple mRNAs and the linker hybridizes with the first-strand cDNA at the random 
oligomer part as above, the above method can obtain fragments containing multiple cDNAs 
derived from multiple mRNAs within a sample. Step 3 ligates these multiple fragments and 
forms concatemers. The ligation of the cDNA fragments can be carried out by a standard 
method, using commercial ligation kits based on but not limited to T4 DNA ligase. The 
ligation can be securely conducted but is not limited to a method, which first is introducing a 
second linker providing a recognition site for a restriction enzyme that is distinct from the 
other recognition sites used at the earlier stages, which is then ligating two fragments into 
dimmers comprising two 5 5 tags in the opposite direction (di-tag), and which is further 
ligating such ligated di-tag fragments into concatemers as described in more detail in 
Example 2 and 3. However, the performance of the invention is not dependent on the cloning 
of intermediary di-tags. As described in more detail in Example 1, mono m eric tags can be 
self-ligated directly to form concatemers of satisfying length to perform the invention. Thus 
the invention is neither limited to nor dependent on the use of di-tags. The number of ligated 
fragments is not restricted, practically any number above two and preferably at least 20 ~ 30 
is suitable to perform the invention. The obtained concatemers are preferably but not limited 
to be amplified or cloned by a standard method. 

The concatemers obtained in this way each comprise a site having the same base sequence 
(however, uracil in RNA would be thymine in DNA) as that of the 5 5 end of the multiple 



15 



WO 03/106672 



PCT/JP03/07514 



mRNAs within the sample. Although it also comprises a part derived from the linker or 
linkers, the base sequence of the linker or linkers is known from the experimental design, so 
the part derived from the linker or linkers and the part derived from mRNA can be clearly 
distinguished by investigating the base sequence of the concatemer. Therefore, by 
5 determining the base sequence of the obtained concatemer, it is possible to find out the base 
sequences at the 5' end of multiple mRNAs within the sample. The base sequences of a 
maximum of 1 6, 20 or 25 bases at the 5' end of each mRNA can be learned by the preferable 
mode of using Gsul, Mme I or EcoP15L Information on 16, 20 or 25 bases would be 
sufficient for almost definitely identifying the mRNA statistically and to judge whether or not 

10 it is a new mRNA. In addition, by determining the base sequence of the concatemer, it is 
possible to learn the base sequences at the 5' end of mRNAs for the number of above 
fragments included in the concatemer (preferably 20 to 30), so information on the 5 5 end of 
multiple mRNAs can be determined efficiently. The analysis of the concatemers can be 
automated by the use of computer software to distinguish between sequences derived form 

15 the 5' ends and sequences derived from a linker or the linkers . 

Sequences from specific 5' end tags obtained from concatemers in the aforementioned form 
can be analyzed for their identity by standard software solutions to perform sequence 
alignments like NCBI BLAST (http://www.ncbi.nlm.nih.gov/BLASTA . FASTA, available in 

20 the Genetics Computer Group (GCG) package from Accelrys Inc. 

(http://www.accelrys.com/), or alike. Such software solutions allow for an alignment of 5' 
end specific sequence tags among one another to identify unique or non-redundant tags for 
clustering and further use in database searches. All such non-redundant sequence tags can 
then be individually counted and further analyzed for the contribution of each non-redundant 

25 tag to the total number of all tags obtained from the same sample. The contribution of an 
individual tag to the total number of all tags should allow for a quantification of the 
transcripts within a plurality of mRNAs or a cDNA library. The results obtained in such a 
way on individual samples can be further compared with similar data obtained from other 
samples to compare their expression patterns against each other. Thus the invention allows 

30 for the expression profiling of individual transcripts within one or more samples and the 
establishment of a reference database. 
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Specific 5' end sequence tags obtained as describe above can further be used to identify 
transcribed regions within genomes for which partial or entire sequences were obtained. Such 
a search can be performed using standard software solutions like NCBI BLAST 
(htt p://www.ncbi.nlm.nih . gov/BLAST/) to align the 5' end specific sequence tags to genomic 
sequences. Though 20 bp tags were found to map specifically to genomic sequences, in some 
cases it may be necessary to extend the initial sequence information obtained from 
concatemers for example by one of the approaches described below. The use of extended 
sequences allows for a more precise identification of actively transcribed regions in the 
genome. Similarly, the same approach and software solutions can be used to search for 
related sequences in other databases e.g. like NCBI 

(http://www.ncbi.nlm.nih.gov/Database/index.html\ EMBL-EBI : 
(http://www.ebi.ac.uk/Databases/index.html > or DNA Data Bank of Japan t ' 

(http://www.ddbj.nig.ac.jp/). 

Specific 5' end sequence tags which could be mapped to genomic sequences allow for the 
identification of regulatory sequences (Suzuki Y et al. EMBO Rep. 2001 May;2(5):3 88-93 
and Suzuki Y et al. Genome Res. 2001 May;ll(5):677-84). In a gene the DNA upstream of 
the 5' end of transcribed regions usually encompasses most of the regulatory elements which 
are used in the control of gene expression. These regulatory sequences can be further 
analyzed for their functionality by searches in databases which hold information on binding 
sites for transcription factors. Publicly available databases on transcription factor binding 
sites and for promoter analysis including Transcription Regulatory Region Database (TRRD) 
(htt p://www m gs.bionet.nsc.ru/mgs/dbases/trrd4/ \ TRANSFAC 
(http://transfac^bf.deATRANSFACA . TFSEARCH 

(htt p://www.cbrc.jp/research/db/TFSEARCH.html\ and Promo terlnspector provide by 
Genomatix Software (htt p ://www.genomatix.de/) provide resources for computational 
analysis of pro moter regions. 

Sequence information obtained from 5 5 end specific sequence tags or obtained by mapping a 
5' end sequences to a genome can be further used to manipulate the regulation of a given 
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target gene. In such an experiment promoter related information would be used to alter its 
activity or to replace it with an artificial promoter. Alternatively, 5 9 end specific tags could 
provide sequence information for the design of anti-sense or RNAi probes for gene 
inactivation. 

In a different embodiment of the invention, sequence information derived from the 
concatemers can be used to synthesize specific primers for the cloning of full-length cDNAs. 
In such an approach, the sequence derived from a given 5' end specific tag is used to design a 
forward primer while the choice of the reverse primer would be dependent on the template 
DNA used in the amplification reaction. Amplification by the polymerase chain reaction 
(PGR) can be performed using a template derived from a plurality of RNA obtained from a 
biological sample and an oligo-dT primer. In the first step the oligo-dT primer and a reverse 
transcriptase are used to synthesize a cDNA pool. In the second step a forward primer 
derived from a 5' end specific tag and an oligo-dT primer are used to amplify a full-length 
cDNA from the cDNA pool. Similarly, a specific full-length cDNA can be amplified from an 
excisting cDNA library using a forward primer derived from a 5' end tag and a vector nested 
reversed primer. 

While the above method had used mRNA or total RNA within the sample as the starting 
substrate, Step 1 can be omitted by using an existing full-length cDNA library. In this way, 
information on the base sequences of the 5 9 end of multiple cDNAs (i.e. the 5' end of the 
mRNAs used as templates for said cDNAs) contained in the full-length cDNA library can be 
efficiently obtained similarly to the above procedure. 

Independent from the starting material used to perform the invention, the single-stranded 
first-strand cDNA material can be fractionated by means of subtractive hybridizations and 
physical separation to allow for enrichment of 5' ends of differentially expressed genes or for 
the concentration of transcripts of low abundance. 

In some embodiments it could be desirable to obtain extended sequence information from the 
5' ends of transcribed regions. Such extended sequences may allow in specific cases for the 
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identification of start sites of protein synthesis or a better mapping to genomic sequences. As 
described above the invention included in Step 2 the ligation of a linker to the 5' end of a 
cDNA. Introducing a single-stranded overhang encompassing a sequence obtained from a 
concatemer to bind to and to be ligation to a specific nucleic acid fragment such a linker can 
used in a target specific manner. After the ligation the linker can be used to enrich the DNA 
fragment by attaching the linker to a support from which it could be released after the 
enrichment. The linker can further be used as a primer to obtain extended sequence 
information on 5' ends in a liquid phase or on the solid phase used before for enrichment. 

By investigating the base sequences of the concatemers or extended 5' sequences obtained by 
the present invention, it is not only possible to clone new genes as described above, but also 
possible to investigate the expression profiles of genes within the sample. Furthermore, the 
technology can be used for various purposes such as to map transcription start sites in. the 
genome, to map promoter usage patterns, for the analysis of SNPs in promoter regions, for 
creating gene networks by combining the expression analysis with information on promoters, 
alternative promoter usage and on availability of transcription factors, and for selective 
collection of the promoter site within fragmented genomic DNA. To select genomic 
fragments containing promoter sites, a fragment containing the same base sequence as the 5 ' 
end of mRNA could be bounded to a support e.g. by using the aforementioned bio tin system, 
and hybridized to fragmented genomic DNA. Hybridized genomic DNA fragments could 
then be separated from a mixture of genomic fragments by using e.g. streptavidin coated 
magnetic beads, and cloned under standard conditions* 

Alternatively, concatemer cloning could be avoided by making and using selected 5 ? end tags 
ligated to a mixture of full-length cDNAs and bound to magnetic beads carrying 
homogeneous sequence of oligonucleotides, followed by ligation such as in the SSLLM, 
second-strand cDNA preparation and cleavage with a Class IIS or Class III restriction 
enzyme. The 5 ? end specific tag would be anchored specifically to the beads and would be 
used for the specific sequencing similarly as done by Lynx Therapeutics (US patent Nos. 
6,352,828; 6,306,597; 6,280,935; 6,265,163; and 5,695,934). 



19 



WO 03/106672 



PCT/JP03/07514 



For instance, oligonucleotides would have a "random part r, which will bind to 5' ends of 
cDNAs; and a code part of the oligonucleotide, which will be able to "tag" the ligation 
product. The oligonucleotide may be destroyed by exonuclease VII if not hybridized with a 
cDNA. The "decoder" oligonucleotides would be used to select out the sequence. The 
specific arrays of cDNAs on beads are then arrayed onto a solid surface, one per position, 
followed by parallel sequencing. The aforementioned approach would allow for the design of 
a liquid array format, in which each bead could be addressed by an independent label and 
processed individually for sequence analysis or alike. 

In a different embodiment of the invention known 5' end specific tags can be used for an 
alternative analysis of 5 5 end specific sequences omitting the cloning and sequencing of 
concatemers. In such a case 5' end specific oligonucleotides of about 25 bp would be 
synthesized and fixed to a solid support to form a 5' end specific micro array. The 
hybridization of 5' tags obtained from a sample would then allow for the identification and 
quantification transcripts present in the sample. Standard methods for the preparation and use 
of microarrays are know to a person trained in the state of the art of molecular biology 
(Jordan B., DN A Microarrays: Gene Expression Applications, Springer- Verlag, Berlin 
Heidelberg New York, 2001: Schena A, DNA Microarrays, A Practical Approach, Oxford 
University Press, Oxford 1999). 

By modifications as the aforementioned approaches for direct sequencing of 5' ends or a 
readout by hybridization to a 5 9 end specific microarray the invention provides different 
means for the general analysis of 5' ends in the form of concatemers or the analysis of 
individual 5' ends, which were enriched by means of a 5' end specific selection. 

Fig. 2 summarizes the exemplary work flow according to Steps 2 and 3 discussed above. 

In Fig. 2, the restriction enzymes Xma JI, Mme I and Xba I are used for the cloning of 33 bp 
DNA fragments as described in more detail in the Example 1 below. In principle, the cloning 
of 5' end specific tags comprises the following steps. 
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In the initial step of the invention outlined in Fig. 1, a pool of single-stranded cDNA is 
obtained. The pool comprises the 5' end regions transcribed from the mRNAs. Adjacent to 
the portion of the single-stranded cDNA which contains the 5' end regions transcribed from 
the mRNAs, a specific linker, here denoted as "1 st Linker", is ligated to provide a recognition 
site for a restriction enzyme that cleaves outside the 1 st linker with respect to its binding site 
or within the 5 1 end transcripeted region. For the purpose of the example described in the 
figure, the restriction enzyme Mme I is used as it cleaves 21 bp downstream of the 
recognition site, thus allowing for the termination of tags which comprise the 5' ends of 
transcribed regions of mRNAs. Also, a second restriction enzyme is given for the "1 st 
Linker." For the purpose of this example, Xma JI is used for the later cloning of the 5' end 
specific tags. 

« 

Subsequently, the "1 st Linker" is used to prime the synthesis of a second complementary 
cDNA strand, resulting in double-stranded cDNA molecules which comprise the 5' ends of 
transcribed regions of the mRNAs and which have a recognition site for restriction enzymes 
that cleave at a site located outside the 1 st Linker with respect to its binding site adjacent to 
the region containing the 5' end regions transcribed the mRNAs. 

The aforementioned restriction enzyme that cleaves the outside of the binding site is, for the 
purpose of this example, Mme L Cleavage with Mme I results in double-stranded cDNA 
fragments of the tags which comprise the 5' ends of transcribed regions of the mRNAs and 
the "1 st Linker" and which have a single strand DNA overhang at the cleavage site of Mme L 

To the aforementioned single-stranded DNA overhang at the cleavage site of Mme I, a "2 
Linker" is ligated to provide a recognition site for a restriction enzyme suitable for the 
cloning of the cDNA fragments or tags which function as templates for amplification by 
means of PCR. 

The cDNA fraction comprising the "1 st Linker", cDNA fragments comprising the 5' ends of 
regions transcribed from the mRNAs, and the "2 nd Linker" is purified by selective binding to 
a support by the means of a selective binding substance attached to the 1 Linker. 
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For the purpose of the cloning of the cDNA fragments comprising the 5' ends of transcribed 
regions or tags, the aforementioned cDNA fraction comprising the "1 st Linker", cDNA 

cc iid 

fragments or tags which comprise the 5' end regions transcribed from mRNA, and the "2 
Linker" are amplified by means of PGR, and the linker portions are cleaved off by restriction 
enzymes to allow for the ligation of the tags into concatemers. For the purpose of this 
example, the restriction enzymes Xma JI and Xba I are used, which cleave out a 33 bp 
fragment from the aforementioned cDNA fragments. After an appropriate purification step, 
the 33 bp fragments are ligated to each other for the formation of concatemers comprising, 
for example, up to 30 tags comprising the 5' ends of transcribed regions said mRNA or 
cloned individually. 

The concatemers can be cloned into a sequencing vector to prepare a library comprising the 
5' end regions transcribed from mRNA. 

Fig. 3 shows a principle workflow according to the present invention to illustrate an 
alternative approach for the direct sequencing of 5' end tags. For the purpose of this 
embodiment of the invention, the single-stranded cDNAs which comprises the 5 ' end regions 
transcribed from the mRNAs and obtained as summarized in Fig. 1 are ligated to a linker, 
here denoted as "1 st Linker", which for the purpose of this example, has a specific label to 
allow for the immobilization of the ligation product on a solid support. This linker can be 
used as a primer for the synthesis of a 2 nd strand cDNA complementary to the first strand. 
The single-stranded DNAs having a double-stranded linker adjacent to the region comprising 
the 5 7 end regions transcribed from the mRNAs or double-stranded DNA comprising the 5' 
end transcribed regions can be forwarded for individual or parallel sequencing, for the 
purpose of this example, by a high throughput serial sequencing approach for the 5' ends of 
mRNAs. 

The present invention will now be described by way of examples thereof. It should be noted 
that the present invention is not restricted to the Examples. The experiments described in the 
Examples can be performed by any person experienced in the state of the art of standard 
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techniques in the field of Molecular Biology. Unless otherwise defined in the text, the 
technical terms, abbreviations, and solutions used in the Examples should have the same 
meaning as commonly understood by a person experienced to the state of the art in the field 
of the invention. A general description of such terms, abbreviations and solutions can be 
5 found in the common reagent section in Molecular Cloning (Sambrook and Russel, 2001). 
All publications mentioned herein are incorporated into this document by reference to be 
disclosed and to describe the methods and/or materials therein. 

Examples 

10 

Example 1 : Preparation of 5 ? end specific tags according to the invention omitting di-tags 

To perform the invention mRN A or total RNA samples can be prepared by standard methods 
known to a person trained in the art of molecular biology as for example given in more detail 
15 in Sambrook and Russel, 2001. Carninci P. et al. (Biotechniques 33, 306-9, (2002)) described 
one such method used herein to obtain cytoplasmic mRNA fractions, however, the invention 
is not limited to this method and any other approach for the preparation of mRNA or total 
RNA should allow for the performance of the invention in a similar manner. 

20 The preparation of mRNA from total RNA or cytoplasmic RNA is preferable but not 

essential to perform the invention as the use of total RNA can provide satisfying results in 
combination with the cap-selection step described below in this example. Generally speaking, 
mRNA represents about 1-3 % of the total RNA preparations, and it can be subsequently 
prepared by using commercial kits based on oligo dT- cellulose matrixes. Such commercial 

25 kits including, but not limited to, the MACS mRNA isolation kit (Milteny) provided 
satisfactory mRNA yields under the recommended conditions when applied for the 
preparation of mRNA fractions for performing the invention. To perform the invention one 
cycle of oligo-dT mRNA selection is sufficient as extensive mRNA purification can 
particularly cause the lost of long mRN As. 
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All mRNA samples used to perform the invention were analyzed for their ratios of the OD 
readings at 230, 260 and 280 nm to monitor the mRNA purity. Removal of polysaccharides 
was considered successful when the 230/260 ratio was lower than 0.5 and an effective 
removal of proteins was obtained when the 260/280 ratio was higher than 1.8 or around 2.0 
The RNA samples were further analyzed by electrophoresis in an agarose gel and to prove a 
good ratio between the 28S and 18S rRNAin total RNA preparations. 

The first-strand cDNA was prepared from different mRNA samples using Superscript II 
(Invitrogen) under the following conditions: 

In a final volume of 22 yl 5-25 \ig of purified mRNA or up to 50 |ig of total RNA were 
mixed with 14 fxg of the appropriate purified 1 51 strand cDNA primer (5 ? - 
(GA) 5 AAGGATCCTGCCATTTCAT^ 

3 ') (SEQ ID NO: 1) and heated to 65° C for 10 min to allow for annealing of the primer and 
afterwards immediately placed on ice. 

In a second tube the reaction mixture for the first-strand synthesis was prepared with a final 



volume of 128 |xl: 

2 X GC I (LA Taq) buffer (TaKaRa) 75 ul 

dATP, dTTP, dGTP, and 5-methyl-dCTP, 10 mM each 4 ul 

4.9 M sorbitol 20 M 1 

• Saturated trehalose (approximately 80%) 10 ul 

• Superscript II reverse transcriptase (200 U/|a.l) 1 5 ul 
ddH 2 Q 4 Ml 



A third reaction tube with 1.5 ul of a 32 P-dGTP (Amersham Pharmacia Biosciences BioTech) 
was prepared, and the reaction mixture along with the reaction tube holding the radioactive 
tracer and the RNA template were heated to 42* C. When all solutions had reached the 
starting temperature of 42' C the reaction mixture and the RNA template were mixed quickly 
and out of this solution 40 ul were transferred into the reaction tube holding the radioactive 
tracer. The remaining reaction mixture with the RNA can be processed in parallel with the 
radioactive reaction mixture. The first-strand cDNA synthesis was performed in a 
thermocycler with the following settings: 42' C for 30 min; 50' C for 10 min; and 55* C for 10 
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min. After having concluded the cycle the reaction was stopped by adding EDTA solution 
(from a stock of 0.5M) to a final concentration of 10 mM, It is not essential for the 
performance of the invention to include a radioactive tracer during the first-strand cDNA 
synthesis, though it can be very helpful to measure the synthesis rate of the reaction and to 
5 analyze the cDNA e.g. by alkali gel electrophoresis. Radioactive and non-radioactive 

materials can be mixed in a new tube and processed together for the following steps. Adding 
protease K to a final concentration of 1 jig/pi destroyed remaining enzyme activity in the 
reaction mixture after an incubation at 50° C for 15 min or longer. From the reaction mixture 
RNA and first-strand cDNA were isolated by precipitation with CTAB urea followed by 

10 ethanol as described below. To a reaction mixture of about 128 to 142 \xl, 32 jjlI of 5 M 

sodium chloride and 320 |il of a 1% CTAB (cetyl trimethyl ammonium bromide) solution in 
4M urea were added and mixed carefully. The solution was incubated at room temperature 
for 10 min before the precipitate was isolated by centrifugation at 15,000 rpm for 10 min. 
The supernatant was removed and the pellet carefully re-suspended in 100 {il of 7M . 

15 guanidine chloride. For the ethanol precipitation 250 pi of absolute ethanol were added and 
the mixture and left at -80° C for 60 min to allow for the formation of the precipitate. The 
precipitate was collected by centrifugation at 15,000 for 10 min and subsequently washed 
twice with 800 jxl of 80% ethanol. Finally the pellet was re-suspended in 46 \xl of water. 

20 In the example described here the invention made used of the so-called cap trapper method 
for full-length cDNA selection. As the invention is not limited in its performance to the cap 
trapper method other means for full-length selection can be applied in a similar way. The cap 
trapper selection was initiated by biotinylation of the cap structure at the 5' end of mRNA 
molecules. To the aforementioned first-strand cDNA solution 3.3 |il of 1 M sodium acetate 

25 buffer, pH 4.5, and freshly prepared 10 mM NaI04 solution, to final concentration of 1 mM, 
were added and the volume was brought up to a final volume of 55 The mixture was 
incubated on ice and in darkness for 45 min, and the reaction was then quenched by the 
addition of 1 jj! of 80% glycerol. Out of the reaction mixture RNA and cDNA were isolated 
by precipitation with isopropanol. To aforementioned reaction mixture, 0.5 pi of 10% SDS, 

30 1 1 pi of 5M sodium chloride and 61 pi of isopropanol were added, mixed carefully and 
incubated at -80° C for 30 min in total darkness. After collecting the precipitate by 

25 
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centrifugation for 15 min at 15,000 rpm, the pellet was washed twice with 500 pi of 80% 
ethanol. The pellet was finally re-suspended in 50 pi of water. The oxidized diol groups in 
the mRNA were used to introduce bio tin moistures in a reaction with bio tin hydrazide. To the 
aforementioned 50 pi RNA/cDNA solution 160 pi of biotin hydrazide long arm (Vector 
5 Laboratories) dissolved at 10 mM concentration in a reaction buffer containing 50 mM 
sodium citrate buffer pH 6.1, and 0.1% W/V SDS were added to a final volume of 210 pi. 
The reaction was performed overnight at room temperature to allow for a complete 
modification of all oxidized diol groups. The reaction was terminated by the precipitation of 
the RNA and cDNA, for which 75 pi of 1 M sodium citrate, pH 6.1, 5 pi of 5 M sodium 
10 chloride and 750 pi of absolute ethanol were added to the reaction mixture. After incubation 
for 1 h at -80° C the precipitate was collected by centrifugation at 15,000 rpm for 10 min. The 
resulting pellet was washed twice with 500 pi of 80% ethanol and finally re-suspended in 175 
pi TE buffer (1 mM Tris, pH 7.5, 0.1 mM EDTA). 

15 Full-length cDNAs were further processed from the aforementioned solution by the addition 
of 20 pi RNase I buffer (Promega) and 1 units of RNase I (Promega, 5 or 10 U/pl) per each 1 
pg of starting mRNA or total RNA. The reaction mixture with a final volume of 200 pi was 
incubated at 37°C for 30 min before the reaction was stopped by the addition of 4 pi of a 
1 0% SDS solution and 3 pi of a 1 0 pg/pl proteinase K solution. To destroy the RNase I the 

20 reaction mixture was further incubated at 45° C for additional 15 min. The reaction mixture 
was then extracted once with 1:1 Tris (pH 7.5)-equilibrated phenol : chloroform before the 
precipitation of the RNA and DNA. For an improved yield of the precipitation 20 pg of 
carrier tRNA and 1 volume of isopropanol were added to the reaction mixture and incubated 
at -20° C. The precipitate was collected by centrifugation at 15,000 rpm for 10 min, washed 

25 with 500 pi of 80% ethanol and finally re-suspended in 20 pi of 0. lxTE buffer. 

For the isolation of full-length cDNAs magnetic beads coated with streptavidin were used in 
this example. However, the invention is not limited to the use of magnetic beads as any other 
solid phase coated with streptavidin or avidin could be used in a similar fashion. To minimize 
30 the non-specific binding of nucleic acids to the surface of the magnetic beads, these were pre- 
incubated before use with DNA-free tRNA. To about 500 pi of magnetic beads slurry (MPG 
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particle, CPG, New Jersey) about 100 pg of tRNA in 10 pi of water was added and incubated 
on ice for some 30 min with occasional mixing. The magnetic beads were separated from the 
solution by applying a magnetic force for about 3 min. After the supernatant was removed 
the beads were washed three times with 500 pi of a binding buffer containing 4.5 M sodium 
5 chloride and 0.05 M EDTA to remove free strep tavidin from the solution. The beads were 
then re-suspended in 500 |xl of the binding buffer, and out of those 350 pi of the slurry were 
mixed with the aforementioned RNase-treated cDNA. The resulting slurry was incubated 
under ongoing agitation at 50°C for 10 min before adding additional 150 pi of the 
streptavidin coated magnetic beads. The resulting slurry was again incubated under ongoing 

10 agitation for another 20 min at 50° C. Biotinylated full-length mRNA/cDNA hybrids were 
retained on the magnetic beads and separated from the supernatant by applying a magnetic 
force. In doing so the beads were washed carefully twice with 500 pi of the binding buffer, 
once with 500 pi of 0.3 M sodium chloride containing 1 mM EDTA, and finally twice with 
500 pi of a buffer containing 0.4% SDS, 0.5 M sodium acetate, 20 mM Tris-HCl pH 8.5, and 

15 1 mM EDTA. Single-stranded cDNAs were released from the beads by alkali treatment of 
mRNA/DNA hybrids by applying 100 pi of 50 mM sodium hydroxide containing 5 mM 
EDTA and 5 min incubation at room temperature. During this incubation time the slurry was 
occasionally mixed. The supernatant was removed and the elution was repeated twice under 
the same conditions. All three supematants were pooled and placed on ice immediately. The 

20 eluted fractions, about 150 pi, were neutralized by addition of 50 pi of 100 mM Tris pH 8.0, 
followed by phenol/chloroform extraction and precipitation. The resulting solution of about 
200 pi was then treated with RNase I and proteinase K as described above, extracted once 
with the same volume of Tris-equilibrated phenol : chloroform (ratio 1 :1) and out of the 
aqueous phase the DNA was precipitated with ethanol by adding to 250 pi sample 12.5 pi of 

25 5M sodium chloride, 3.5 pi of 1 pg/pl glycogen, and 250 pi of isopropanol. After incubation 
at -80° C for some 30 min, the DNA was collected by centrifugation at 15,000 rpm for 20 min. 
After having washed the pellet twice with 500 pi of 80% ethanol, the DNA was finally re- 
suspended in 5 pi of 0.1 xTE buffer. 

30 For the next step described in this example a specific linker having a recognition site for the 
Class IIS restriction enzyme Mme I along with recognition sites for the restriction enzymes 
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Xhol, I-Ceul, and XmaJI was designed. However, the invention is not limited to the use of 
the restriction enzymes given in this example, and the use of other enzymes is described later 
in yet a different example. The double-stranded linker was assembled out of two upper strand 
oligonucleotides with random overhangs and a shorter lower strand oligonucleotide. Note 
that for the upper strand oligonucleotides, a 4:1 mixture of two oligonucleotides with distinct 
overhangs was used. The oligonucleotides named below were obtained from Invitrogen 
Japan and gel purified before annealing. The different end-modifications of the 
oligonucleotides are indicated below, where "Bio" stands for 5' biotinylated "Pi" stands for 
5' phosphorylated, and "NH 2 " stands for 3' amino group. The same abbreviations will be 
used later in the text for other oligonucleotides: 

Upper oligonucleotide GN5: Bio- - 

agagagagacctcgagtaactataacggtcctaaggtagcgacctaggtccgacgNNNNN (SEQ ID NO: 2) 
Upper oligonucleotide N6: Bio- 

agagagagacctcgagtaactataacggtcctaaggtagcgacctaggtccgacNNNNNN (SEQ ID NO: 3) 
Lower oligonucleotide: Pi-gtcggacctaggtcgctaccttaggaccgttatagttactcgaggtctctctct-NH 2 (SEQ 
ID NO: 4) 

The oligonucleotides were mixed at a ratio of 4xGN5 : lxN6 :5x"Lower" at a concentration of 
2 |ig/pl in 100 mM sodium chloride. For annealing the mixture was incubated at 65 °C 
followed by additional incubations at 45° C for 5 min, at 37° C for 10 min, and at 25° C for 10 
min. For ligation of the linker to the single-stranded cDNA 2 \ig of linker per 1 [ig cDNA 
were used. 

In a final volume of 7. 5 |xl of 0. lxTE the aforementioned cDNA and the aforementioned 
linker were mixed and incubated at 65° C for 5 min to melt secondary structures in the cDNA. 
The double-stranded linker was then ligated to the single-stranded cDNA using a TaKaRa 
ligation kit, version 2. Out of the kit 7.5 jil of "Solution IT' and 1 5 \i\ of "Solution F were 
added to the aforementioned annealing reaction mixture, mixed and incubated for 10 h at 
16° C. The ligation reaction was terminated by adding 1 |il of 0.5 M EDTA, 1 jil of 10% SDS, 
1 111 of 1 0 mg/ml proteinase K, and 1 0 \xl of water. After incubation at 45 °C for 1 5 min the 
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resulting mixture was extracted with the three-fold excess of Tris-equilibrated 
phenol/chloroform. The remaining excess of free linker was removed from the reaction 
mixture by gel filtrating of the solution in a S-300 spin column (Amersham Pharmacia 
Biosciences) according to the description of the maker. Briefly, the S-300 columns were 
5 transferred into a centrifugation tube and spun at 3,000 rpm for 1 min to remove the storage 
buffer from the column. After placing the column in a new centrifugation tube the DNA 
sample (about 60 |xl) followed by another 40 |il of water were added to the column and the 
column was spun with 3,000 rpm for 5 min at 4°C to collect the run through. To concentrate 
the DNA the eluat from the S300 column was placed on a Microcon 100 membrane 
10 (Amicon) and centrifiiged until a final volume of 1 0 [il was achieved. The membrane was 
washed once with 10 \il of O.lxTE at 65°C for 3 min and the fractions were united for use in 
the following second strand synthesis. 

For the second-strand cDNA synthesis a thermostable DNA polymerase was applied. As this 
15 reaction was performed at a high temperature an excess of upper primer was added to the 

reaction mixture. This primer was obtained from Invitrogen Japan and gel purified before use. 
The sequence of the primer resembles the features described above for the upper primer, 
though no random overhang was included: 5'-Bio- 

agagagagacctcgagtaactataacggtcctaaggtagcgacctaggtccgacg (SEQ ID NO: 5). 

20 

The reaction mixture was set up by mixing the following components: 



• cDNA sample 10 (xl 

• 100 ng/|Lil second-strand primer 6 pi 
5X A buffer (NEB) 7.2 pi 

25 • 5X B buffer (NEB) 4.8jnl 

2.5 mM dNTP's (Takara) 6 pi 

• ddH 2 0 up to 45 [il 



The reaction mixture was heated to 65° C before 15 \il of 1 U/|d ELONGASE (Invitrogen) 
were added, and reaction was performed in a thermocycler with the following settings: 5 min 
30 at 65° C, 30 min at 68° C, and 10 min at 72° C. The polymerase reaction was terminated by 
adding 1 pi of 0.5 MEDTA, 1 |il of 10% SDS, and 1 |xl of 10 mg/ml proteinase K. After 
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incubation at 45' C for 15 min the resulting mixture was extracted with the same volume of 
Tris-equilibrated phenol/chloroform (ratio 1 :1). The remaining excess of free primer was 
removed from the reaction mixture by gel filtrating of the solution in an S-300 spin column 
(Amersham Pharmacia Biosciences) according to the description of the maker. Briefly, the S- 
300 columns were transferred into a centrifugation tube and spun at 3,000 rpm for 1 min to 
remove the storage buffer from the column. After placing the column in a new centrifugation 
tube the DNA sample (about 60 pi) followed by another 40 pi of water were added to the 
column and the column was spun with 3,000 rpm for 5 min at 4° C to collect the run through. 
To concentrate the DNA the eluat from the S300 column was placed on a Microcon 100 
membrane (Amicon) and centrifuged until a final volume of 1 0 pi was achieved. The 
membrane was washed once with 10 pi of O.lxTE at 65°C for 3 min and the fractions were 
united for use in the next step. 

The resulting double-stranded cDNA was in the next step cleaved with a Class IIS restriction 
enzyme, which was for the purpose of this example Mme I. The reaction was set up by 
mixing the following components in a final volume of 1 00 pi: 

ddcDNA 50 1*1 

lOXreaction buffer (NEB) 10 pi 

Mmel (2U/pl, equal to 3U/pg DNA) 1.5pl 

lOxSAM 2 1* 1 

• ddHaO to final volume of 100 pi 

After incubation at 37°C for 1 h the reaction was terminated by adding 2 pi of 0.5M EDTA, 2 
pi of 10% SDS, and 2 pi of 10 pg/pl proteinase K followed by a further incubation at 45°C 
for another 15 min. The reaction mixture was then extracted once with the same volume of 
Tris-equilibrated phenol : chloroform (ratio 1:1) and out of the aqueous phase the DNA was 
precipitated with isopropanol by adding to 150 pi of the sample 7.5 pi of 5 M sodium chloride, 
3 pi of 1 pg/pl glycogen, and 150 pi of isopropanol. After incubation at -80' C for some 30 
min, the DNA was collected by centrifugation at 15,000 rpm for 20 min. After having 
washed the pellet twice with 500 pi 80% ethanol, the DNA was finally re-suspended in 2 pi 
of O.lxTE buffer. 
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After having cleaved the double-stranded cDNA with the Class IIS restriction enzyme Mmel 
a second linker was ligated to the 2 bp overhang at the cleavage site. This second linker was 
comprised out of the following two oligonucleotides of 45 bp length and having a Xbal 
recognition site, which was used in this example for later cloning. However, the invention is 
5 not limited to the use of Xbal as other restriction enzymes can be applied for this step with 
similar efficiency. 

Upper-Xbal: Pi-tctagatcaggactcttctatagtgtcacctaaagtctctctctc-NH2(SEQ ID NO: 6) 
Lower-Xbal: gagagagagactttaggtgacactatagaagagtcctgatctagaNN(SEQ ID NO: 7) 
The two oligonucleotides were obtained from Espec, and purified by acrylamide 
10 electrophoresis before being annealed. For annealing a mixture of 2 fig/fil of each 

oligonucleotide in 100 mM sodium chloride was incubated at 65° C followed by additional 
incubations at 45° C for 5 min, at 3T C for 10 min, and at 25° C for 10 min. 

The double-stranded linker was then ligated to the cDNA in a reaction mixture containing 2 
15 pi of aforementioned cDNA solution, 4 jxl of the annealed linker DNA (0.4 (Ag/pl), and 8 jxl 
of water. Before adding the ligase, the reaction mixture was incubated at 65° C for 2 min 
followed by a brief incubation on ice. Then 2 [il of a 1 Oxreaction buffer (NEB), 2 \il of T4 
DNA ligase (NEB, 40 U/ pi), and 2 \il of water were added, followed by an incubation at 
16* C for 16 h. Heating the reaction mixture to 65° C for 5 min terminated the ligation 
20 reaction. 

Ligation products having biotin moistures at the 5' end were separated from none modified 
DNA, for which the ligation to the first linker had failed. Streptavidin coated magnetic beads 
(Dynabeads) were used at this point in a similar way as described before. About 200 jil of the 

25 original slurry were incubated under occasional agitation with 5 |xg of tRNA in a volume of 
200 jjlI for about 20 min at room temperature. After collection of the beads by a magnetic 
force, the beads were washed three times with 200 |il of a buffer containing 1M sodium 
chloride, 0.5 mM EDTA, and 5 mM Tris-HCl pH 7.5, before being re-suspended in 200 \il of 
the same buffer. After the washing steps the beads were mixed with the aforementioned 

30 ligation product, and the resulting slurry was incubated under ongoing agitation at room 

temperature for 15 min to allow for the binding of the modified DNA to the beads. After the 
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binding reaction was completed, applying a magnetic force collected the beads and the 
supernatant was removed completely. While being fixed to the bottom of the tube by the 
magnetic force, the beads were rinsed twice with 200 |il of lxB&W buffer (1 0 mM Tris pH 
7.5, 1 mM EDTA, 2 M sodium chloride) plus IxBSA buffer (1 mg/ml provided by NEB), 
twice with 200 |il of 1 xB&W buffer, and finally twice with 200 |il of 0. 1 xTE. 

DNA fragments bound to the magnetic beads by the means of a biotin-streptavidin 
interaction were released from the beads by treatment with an excess of free biotin. A fresh 
biotin stock (Sigma) was directly prepared to a final concentration of 1 .5% (W/V) in 4 M 
guanidine thiocyanate, 25 mM sodium citrate, pH 7.0, and 0.5% sodium N-lauroylsarcosinate. 
The aforementioned beads were re-suspended in 50 |xl of the biotin solution and incubated at 
45° C for 30 min under occasional agitation. The supernatant was separated from the beads by 
applying a magnetic force and collected in a separate tube. The elution step was repeated 
three times under the same conditions as described above, and all fractions were pooled for 
the isolation of the cDNAby isopropanal precipitation. For isopropanol precipitation about 
250 \il of the sample were mixed with 12.5 jxl 5M sodium chloride, 3.5 pi of a 1 [ig/jil 
glycogen solution and 250 [il of isopropanol. After incubation at -80° C for 30 min the 
precipitate was collected by centrifugation at 15,000 rpm for 15 min, and the pellet was 
washed twice with 500 (il of 80% ethanol before being re-suspended in 50 \il O.lxTE. 

The DNA was further purified by gel filtration on a G50 spun column (Amersham Pharmacia 
Biosciences) according to the maker's directions followed by RNase I and proteinase K 
treatment. To about 100 jxl sample derived from the gel filtration 2 \il of RNase I (ProMega) 
were added, the resulting reaction mixture was incubated for 10 min at 37° C, followed by the 
addition 2 \il of 10 ^ig/|xl proteinase K, 2 pi of 0.5 M EDTA, and 2 jil of 10% SDS, and an 
additional incubation of 15 min at 45° C. The reaction mixture was then extracted once with 
the same volume of Tris-equilibrated phenol : chloroform (ratio 1:1) and out of the aqueous 
phase the DNA was precipitated with isopropanol by adding to 150 \il of the sample 7.5 jxl of 
5M sodium chloride ? 3 |il of 1 \xg/\il glycogen, and 150 yl of isopropanol. After incubation at - 
80 9 C for some 30 min, the DNA was collected by centrifugation at 15,000 rpm for 20 min. 
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After having washed the pellet twice with 500 pi of 80% ethanol, the DNA was finally re- 
suspended in 20 pi of 0. lxTE buffer. 

Before cloning the DNA fragments were amplified by a PGR step using the following two 
5 linker-specific primers, which were obtained from Invitrogen Japan: 

Primer l(uni-PCR) 
5' Bio-gagagagagactttaggtgacacta 3 5 (SEQ ID NO: 8) 

10 Primer 2(MmeI~PCR) 

5 5 Bio-agagagagacctcgagtaactataa 3' (SEQ ID NO: 9) 

The PGR amplification was performed in a total volume of 50 pl and the following setup: 



« DNA Sample 1 pi 

15 ■ 10X buffer 5 pi 

DMSO 3 pi 

2.5mMdNTPs 12.5 pi 

Primer 1 (3 50 ng/pl) 0. 5 pi 

Primer 2(3 50 ng/pl) 0. 5 pi 

20 ■ ddH 2 0 27.5 pi 

ExTaq (5U/pl,TaKaRa) 0.5 pi 



After an initial incubation at 94° C for 1 min, 15 cycles were performed in a thermocycler 
with 30 sec at 94° C, 1 min at 55° C, 2 min at 70° C followed by a final incubation 5 min at 
70° C. To cover the entire DNA sample 20 PCR reactions were run in parallel to obtain higher 

25 yields during the amplification step. The resulting PCR products were then pooled and 

further purified. To about 600 pi of DNA sample 10 pi of 1 0 pg/pl proteinase K, 1 0 pi 0.5 M 
EDTA, and 10 pi of 10% SDS were added, and incubated for 15 min at 45° C. The reaction 
mixture was then extracted once with the same volume of Tris-equilibrated phenol : 
chloroform (ratio 1:1) and out of the aqueous phase the DNA was precipitated with 

30 isopropanol by adding to 600 pi of the sample 30 pi of 5M sodium chloride, 3.5 pi of 1 pg/pl 
glycogen, and 600 pi of isopropanol. After incubation at -80° C for some 30 min, the DNA 
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was collected by centrifugation at 15,000 rpm for 20 min. After having washed the pellet 
twice with 500 pi of 80% ethanol, the DNA was finally re-suspended in 50 pi of O.lxTE 
buffer. 

5 The PCR products were further purified on a 12% polyacrylamid geL The appropriate band 
of 119 bp was visualized by UV and identified by comparison to an appropriate marker and 
cut out of the gel with a blade, transferred into a tube, crashed by mechanic force, and 
extracted with 150 pi of a buffer containing 0.5M ammonium acetate, lOmM magnesium 
acetate, ImM EDTA, pH 8.0, and 0.1%SDS for 1 h at 65° Q The elution step was repeated 

10 twice before filtrating the supernatants in a MicroSpin Columns (Amersham Pharmacia 
Biosciences) by centrifugation at 3,000 rpm in for 2 min. The centrifugation was repeated 
after applying another 50 pi of O.lxTE to the column. The resulting extract of about 300 pi 
was then extracted once with the same volume of Tris-equilibrated phenol : chloroform (ratio 
1 :1) and out of the aqueous phase the DNA was precipitated with ethanol by adding to 300 pi 

15 ofthe sample 15 pi of 5M sodium chloride, 3.5 pi of 1 pg/pl glycogen, and 750 plofabsolute 
ethanol. After incubation at -80° C for some 30 min, the DNA was collected by centrifugation 
at 15,000 rpm for 20 min. After having washed the pellet twice with 500 pi of 80% ethanol, 
the DNA was finally re-suspended in 20 pi of O.lxTE buffer. 

20 Before cloning the DNA fragments were re-amplified by a second PCR step under the same 
conditions as described above. This second PCR amplification was preferable but not 
essential to obtain sufficient amounts of DNA for the ligation. Briefly, the PCR amplification 
was performed in a total volume of 50 pi and the following setup: 



DNA Sample 1 1^1 

25 ■ 10X buffer 5 pi 

DMSO 3 pi 

2.5mMdNTPs 12.5 pi 

Primer 1(350 ng/pl) 0.5 pi 

- Primer 2(3 50 ng/pl) 0. 5 pi 

30 ■ ddH 2 0 27.5 pi 

ExTaq (5U/pl,TaKaRa) 0.5 pi 
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After an initial incubation at 94° C for 1 min, 6 cycles were performed in a thermocycler with 
30 sec at 94° C, 1 min at 55° C, 2 min at 70° C followed by a final incubation 5 min at 70* C 
To cover the entire DNA sample 20 PGR reactions were run in parallel to obtain higher 
yields during the amplification step. The resulting PGR products were then pooled and 
5 further purified. To about 600 |il of DNA sample 10 |il of 1 0 |xg/|il proteinase K, 1 0 |al of 0.5 
M EDTA, and 10 |il of 10% SDS were added, and incubated for 15 min at 45° C. The reaction 
mixture was then extracted once with the same volume of Tris-equilibrated phenol : 
chloroform (ratio 1:1) and out of the aqueous phase the DNA was precipitated with 
isopropanol by adding to 600 [il of the sample 30 \xl of 5M sodium chloride, 3.5 jxl of 1 |ig/|il 
10 glycogen, and 600 pi of isopropanol. After incubation at -80° C for some 30 min, the DNA 
was collected by centrifugation at 15,000 rpm for 20 min. After having washed the pellet 
twice with 500 pi 80% ethanol; the DNA was finally re-suspended in 30 pi of O.lxTE buffer. 

The purified PGR product was for the purpose of this example digested by the restriction 
15 enzymes XmaJI and Xbal. Note that cleavage with those two restriction enzymes creates the 
same overhangs, which can be recombined during the formation of the concatemers. 
However, the invention is not limited to the use of those two enzymes as other restriction 
enzymes can be used with similar results. The DNA was first cut with XmaJI in a 100 pi 



reaction mixture composed of: 

20 • DNA sample 30 pi 

10XBuffer(Fermantus) 10 pi 

• XmaJI(l OU/pl, Fermantus) 10 pi 

ddH 2 Q 50 |il 



After incubation for 1 h at 37°C, 2 pi of 10 pg/pl proteinase K, 2 pi 0.5 M EDTA, and 2 pi 
25 10% SDS were added to the sample, and incubated for 15 min at 45° C. The reaction mixture 
was then extracted once with the same volume of Tris-equilibrated phenol : chloroform (ratio 
1:1) and out of the aqueous phase the DNA was precipitated with isopropanol by adding to 
200 pi of the sample 10 pi of 5M sodium chloride, 3.5 pi of 1 p,g/pl glycogen, and 200 pi of 
isopropanol. After incubation at -80° C for some 30 min, the DNA was collected by 
30 centrifugation at 15,000 rpm for 20 min. After having washed the pellet twice with 500 pi 
80% ethanol, the DNA was finally re-suspended in 10 \il of 0 lxTE buffer. 
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For the second digestion with Xbal the aforementioned DNA was then cut with Xbal in a 110 



yd reaction mixture composed of: 

DNA sample 10 pi 

lOXBuffer (NEB) 1 1 ^1 

10XBSA(NEB) 11 111 

XbaI(20Us/ s ul ? NEB) 1 1 yl 

ddH 2 0 67 \il 



After incubation for 1 h at 37°C, 2 pi of 10 [ig/\il proteinase K, 2 |il 0.5 M EDTA, and 2 |xl 
10% SDS were added to the sample, and incubated for 15 min at 45° C. The reaction mixture 
was then extracted once with the same volume of Tris-equilibrated phenol : chloroform (ratio 
1:1) and out of the aqueous phase the DNA was precipitated with isopropanol by adding to 
200 yd sample 10 (al 5M sodium chloride, 3.5 \d 1 [ig/yd glycogen, and 200 |xl isopropanol. 
After incubation at -80° C for some 30 min, the DNA was collected by centrifugation at 
15,000 rpm for 20 min. After having washed the pellet twice with 500 |xl 80% ethanol, the 
DNA was finally re-suspended in 10 yd of O.lxTE buffer. 

The resulting 33 bp DNA fragments were separated from the free DNA ends cut off during 
the restriction digests by incubation with streptavidin coated magnetic beads, which would 
retain the biotin-labeled DNA fragments. Streptavidin coated magnetic beads (Dynabeads) 
were used at this point in a similar way as described before. About 100 |al of the original 
slurry were incubated under occasional agitation with 5 \ig of tRNA for about 20 min at room 
temperature. After collection of the beads by a magnetic force, the beads were washed three 
times with 100 yd of lxB&W. The aforementioned DNA sample was then mixed with the 
beads, incubated at room temperature for 15 min under ongoing agitation, and the 
supernatant was taken off after collection of the magnetic beads by magnetic force. The 
beads were then rinsed one more time with 50 |il lxB&W buffer, and the collected 
supernatants were forwarded to isopropanol precipitation of the DNA. To about to 250 jxl of 
sample, 7.5 yd of 5M sodium chloride, 3.5 of 1 yig/yd glycogen, and 250 \il of isopropanol 
were added. After incubation at -80° C for some 30 min, the DNA was collected by 
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centrifugation at 1 5,000 rpm for 20 min. After having washed the pellet twice with 500 pi 
80% ethanol, the DNA was finally re-suspended in 10 pi of O.lxTE buffer, 

The DNA was further purified by RNase I and proteinase K treatment. To the 
aforementioned 10 sample 5 pi lOxRNase I Buffer (ProMega), 2 pi of RNase I (ProMega), 
and 33 jo,l of water were added, the resulting reaction mixture was incubated for 1 5 min at 
37 s C, followed by the addition 1 pi of 1 0 pg/pl proteinase K, 1 pi of 0.5 M EDTA, and 1 pi 
of 10% SDS, and an additional incubation of 15 min at 45° C. The reaction mixture was then 
extracted once with the same volume of Tris-equilibrated phenol : chloroform (ratio 1:1) and 
out of the aqueous phase the DNA was precipitated with isopropanol by adding to 100 pi of 
the sample 5 pi of 5M sodium chloride, 3.5 jxl of 1 pg/pl glycogen, and 100 pi of isopropanol. 
After incubation at ^-80° C for some 30 min, the DNA was collected by centrifugation at 
1 5,000 rpm for 20 min. After having washed the pellet twice with 500 pi of 80% ethanol, the 
DNA was finally re-suspended in 40 jil of O.lxTE buffer. 

The DNA fragments were further purified on a 12% polyacrylamid geL The appropriate band 
of 33 bp as identified by comparing with a suitable molecular weight marker was cut out of 
the gel with a blade, transferred into a tube, crashed by mechanic force, and extracted with 
150 jil of a buffer containing 0.5 M ammonium acetate, 10 mM magnesium acetate, 1 mM 
EDTA, pH 8.0, and 0.1% SDS for 1 h at 37° C. The extraction step was repeated twice before 
filtrating the supernatants in a MicroSpin Columns(Amersham Pharmacia Biosciences) by 
centrifugation at 3,000 rpm in for 2 min. The centrifugation was repeated after applying 
another 50 pi of 0. lxTE to the column. The resulting extract of about 300 pi was then 
extracted once with the same volume of Tris-equilibrated phenol : chloroform (ratio 1:1) and 
out of the aqueous phase the DNA was precipitated with ethanol by adding to 300 pi of the 
sample 15 pi of 5M sodium chloride, 3.5 pi of 1 pg/pl glycogen, and 750 pi of absolute 
ethanol. After incubation at -80° C for some 30 min, the DNA was collected by centrifugation 
at 15,000 rpm for 20 min. After having washed the pellet twice with 500 pi 80% ethanol, the 
DNA was finally re-suspended in 4 pi of water. 
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In the next step of the invention DNA fragments comprising 5' ends were ligated with each 
other to form concatemers. For this ligation the following reaction was set up: 

DNA Sample 4 \i\ 

10X T4 DNA ligase buffer (New England Biolabs) 1 \il 

5 • T4 DNA Ligase (40 U, New England Biolabs) 1 pi 

50% PEG 8000 4 |il 

After an incubation of 45 min at 16°C the reaction was stopped by adding 1 jil 0.5M EDTA, 



1 |il 10% SDS, 1 jal 10 pg/pl Proteinase K, and 35 \il of water followed by an additional 
incubation of 15 min at 45° C The reaction mixture was then extracted once with the same 

10 volume of Tris-equilibrated phenol : chloroform (ratio 1 :1) and out of the aqueous phase the 
DNA was precipitated with isopropanol by adding to 100 |al of the sample 5 jil of 5M sodium 
chloride, 3.5 of 1 [xg/jal glycogen, and 100 pi of isopropanol. After incubation at -80° C for 
some 30 min, the DNA was collected by centrifugation at 15,000 rpm for 20 min. After 
having washed the pellet twice with 500 pi of 80% ethanol, the DNA was finally re- 

15 suspended in 1 0 pi of 0. lxTE buffer. 

The aforementioned ligation reaction yielded in concatemers of various lengths, and a size 
selection was performed to clone only concatemers of a suitable length for sequencing, e.g. 
longer or shorter than 500 bp. Therefore the concatemers were fractionated on an 8% 

20 polyacrylamid gel, and bands of a size lager than 500 bp and bands of 200 to 500 bp were cut 
out of the gel with a blade and further processed separately. After transferring the gel pieces 
into a tube, those were crashed by mechanic force, and extracted with 150 pi of a buffer 
containing 0.5M ammonium acetate, 10 mM magnesium acetate, 1 mM EDTA, pH 8.0, and 
0.1% SDS for 1 h at 65° C. The extraction step was repeated twice before filtrating the 

25 supernatants in a MicroSpin Columns (Amersham Biosciences) by centrifugation at 3,000 
rpm in for 2 min. The centrifugation was repeated after applying another 50 pi of O.lxTE to 
the column. The resulting extract of about 300 pi was then extracted once with the same 
volume of Tris-equilibrated phenol : chloroform (ratio 1:1) and out of the aqueous phase the 
DNA was precipitated with ethanol by adding to 300 pi of the sample 15 pi of 5M sodium 

30 chloride, 3.5 pi of 1 pg/pl glycogen, and 750 pi of absolute ethanol. After incubation at - 
80° C for some 30 min, the DNA was collected by centrifugation at 15,000 rpm for 20 min. 
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After having washed the pellet twice with 500 pi 80% ethanol, the DNA was finally re- 
suspended in 2 pi of water. 

In the final cloning step the concatemers were cloned into the vector pZEro-1 (Invitrogen), 
which was linearized under standard conditions with Xba I and further purified by gel 
electrophoresis. For this ligation the following reaction was set up: 
• Purified concatemer 2 

Xbal digestion pZErO-1 (1 00 ng/ pi) 1 -25 pi 

10X T4 DNA ligase buffer (New England Biolabs)) 0. 5 pi 

T4 DNA Ligase (24 U, New England Biolabs) 0.6 pi 

Water 0 65 ^ 

After an overnight incubation at 16° C the reaction was terminated by heat treatment for 5 
min at 65° C followed by adding 1 pi of 0. 5M EDTA, 1 |ilof 10%SDS, 1 [dof 10 ^ig/pl 
Proteinase K, and 30 pi of water followed by an additional incubation of 15 min at 45 °C. The 
reaction mixture was then extracted once with the same volume of Tris-equilibrated phenol : 
chloroform (ratio 1:1) and out of the aqueous phase the DNA was precipitated with 
isopropanol by adding to 100 pi of the sample 5 pi of 5M sodium chloride,3.5 pi of 1 pg/pl 
glycogen, and 100 pi of isopropanoL After incubation at -80° C for some 30 min, the DNA 
was collected by centrifugation at 15,000 rpm for 20 min. After having washed the pellet 
twice with 500 pi 80% ethanol, the DNA was finally re-suspended in 6 pi of water. Using 1 
pi of the aforementioned desalted ligation solution, ElectroMAX T M DH10B™ Cells 
(Invitrogen) were transformed by electroporation using a Cell-Porator (Biometrer) according 
to the transformation procedures described in the manufacturer's manual. Transformed 
bacteria were selected on LB medium containing 50 |ag/ml Zeocin (Invitrogen), and positive 
clones thereof were isolated and further characterized as described in the Examples below. 

Example 2: Alternative preparation of 5' end specific tags involving the form ation of di-tags 
Preparation of total RNA from tissue 
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In the literature a variety of different approaches for the preparation of RNA have been 
described, which are known to a person experienced in the state of the art. All such 
approaches should allow the preparation of a plurality of RNA samples derived from 
biological materials including tissues and cells, which are suitable for the invention. Below 
5 two such procedures are described in detail. 
Buffers and solutions: 

a) Solution D: 4M guanidinium thyocyanate, 25mM sodium citrate (pH7.0), lOOmM 2- 
mercaptoethanol and 0.5% n-lauryl-sarcosine. 

b) RNase-free CTAB/UREA solution: 1 % CTAB (Sigma), 4M UREA, 50mM Tris-HCl 

10 (PH 7.0), ImM EDTA (pH 8.0). 

c) Water equilibrated phenol as described in Molecular Cloning (Sambrook and Russel, 

2001). 

Phosphate-buffer saline (PBS) as described in Molecular Cloning (Sambrook and Russel, 
2001) 

15 5 M Sodium chloride 

7 M Guanidium choride 
Rnase free dd-water 

Protocol for total RNA preparation 
20 Dissect the tissue as fast as possible in a cooled dish. 

Roughly evaluate the volume of tissue in a 50 ml falcon tube. The best quantity of tissue is 
between 0.5-1 g of tissue for 20 ml Solution D 

Add 2 ml of 2M sodium acetate (pH 4.0) and 16 ml of water-equilibrated phenol. 
Mix by a vortex. Add 4 ml of chloroform and shake vigorously by your hands and a vortex. 
25 Let it stay on ice for 15 min. 

Centrifuge it at 6,000 rpm for 30 min at 4 ° C 

Transfer the upper aqueous phase to new tube by pipetting (25 ml) and recover 
approximately 20 ml thereof. 

Precipitate the RNA from the aqueous phase by adding 1 equal volume of Isopropanol (in 
30 this case, approximately 20 ml), store on ice for 1 h. 

Centrifuge at 7,500 rpm for 15 min at 4 0 C: RNA is pelleted by centrifugation. 
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The pellet is washed twice with 70% ethanol, each time followed by centrifugation at 7,500 
rpm for 2 min, in order to remove the SCN salts. 

CTAB removal of polysaccharides. Selective CTAB precipitation of mRNAis performed 
after complete RNA re-suspension in 4 ml of water. Subsequently, 1.3 ml of 5 M NaCl is 
added and the RNA is then selectively precipitated by adding 16 ml of a CTAB/urea solution. 
Centrifuge for 15 min at 7500 rpm (9500 x g), discard the aqueous phase. 
Resuspend the RNA pellet in 4 ml of 7 M Gunidinum Cloride. 

Re-suspended RNA is finally precipitated by adding 8 ml of ethanol. Incubate on -20° C for 
1-2 hours (or longer) and centrifuge for 15 min at 7,500rpm, 4°C. At the end, wash the pellet 

with 5 ml of 70% ethanol. 

Centrifuge again at 7,500 rpm for 5 min. 

Discard the supernatant. 

Re-suspend RNA in 500-1000 yd of RNase-free dd-water: 
Preparation of a mRNA fraction from total RNA 

The mRNA fragtion of total RNA preparations can be isolated by the use of commercial kits 
such as the MACS mRNA isolation kit (Milteny) or polyA-quick (Stratagene), which provide 
satisfactory yield of mRNA under the recommended conditions. One cycle of oligo-dT 
selection of the mRNAis sufficient. It is advisable to redissolve the poly-A + RNA at a high 
concentration of 1 to 2 |ig/|il. 

Preparation of a plurality of RNA samples from a cDNA library 

Alternatively, a plurality of nucleic acids corresponding to the 5' ends of genes can be 
obtained from existing cDNA libraries, which were cloned into expression vectors. By 
standard methods known to a person familiar with the state of the art of molecular biology 
approaches, from such libraries RNA transcripts can be obtained by in vitro transcription 
reactions using e.g. a T3, T7 or SP6 RNA polymerase. Such an approach can be performed 
by first linearization of the plasmid DNA with appropriate restriction endonucleases. The 
restriction enzyme can be chosen to allow for the transcription of the sense RNA. In the case 
of libraries obtained in the vector pFLC III (Carninci P, et al., Genomics, 2001 Sep;77(l- 
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2):79-90), the vector can be linearized by cleavage with one of the homing endonucleases I- 
Ceu I or Pl-Sce I to avoided a truncation of the inserts. For the digest mix in a tube 
PlasmidDNA 100 pg 

lOx buffer 40 pi 

5 Restriction enzyme 100 U 

ddH 2 Oad 400 pi 

Incubate at appropriate temperature for at least 2h and analyze 1 pi of the reaction mixture by 
agarose gel electrophoreses. If the digest is completed, add: 
0.5 M EDTA 8 
10 10% SDS 8 pi 

Proteinase K (10 mg/ml) 5 pi 

Incubate for 15 min at 45° C before extracting sample with 500 pi phenol/chloroform. The 
aqueous phase is to be re-extracted twice with 500 pi chloroform. Finally linearized DNA is 
precipitated with isopropanol or ethanol under standard conditions and dissolved in 50 pi TE. 

15 

In vitro RNA synthesis: 

Mix in a tube under Rnase free conditions: 

Linearized plasmid DNA 20 (ig 

5x T7 or T3 buffer 200 pi 

20 0.1MDTT 100 pi 

2 mg/ml BSA 40 pi 

10 mM rNTPs 50 pi 

T7 or T3 RNA polymerase 1 0 pi 

ddH 2 Oad 1000 pi 

25 Incubate at 37° C for 3 to 4 h before adding: 

10 mM Calcium Chloride 1 0 pi 

lU/plDNaseRQl 5 pi 

Incubate at 37° C for 20 min before adding: 

0.5 M EDTA 10 pi 

30 10 mg/ml Protease K 5 pi 
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Incubate at 45° C for 30 min, before addition of Sodium Chlorid to a final concentration of 
1M. Phenol/Chloroform extraction followed be re-extraction with Chloroform should be 
performed under standard conditions, and the RNA transcripts can be finaly collected by 
Isopropanol or Ethanol precipitation. The pellet is to be resuspended in 200 [xl of water or TE. 
5 The quality of the RNA transcripts should be confirmed by agarose gel electrophorese and 
quantification. 

First strand cDNA synthesis 

10 Buffers and solutions 

Saturated Trehalose, about 80% in water (crystals will remain), low metal content 
4.9 M high purity sorbitol 
Optionally: Takara GC-Taq buffer 

15 Enzymes and buffers 

RNase H" reverse transcriptase Superscript II (Invitrogen) and buffer or other reverse 
transcriptases. 

Nucleic acids and oligonucleotides 
20 Purified, first-strand oligo-dT primer (Sequence for primer used: 

S-GAGAGAGAGAGGATCCTTCTGGAGAGTlllllllll^llllU'VN-S 1 ) (SEQ ID NO: 
10). Alternatively or additionally, random primer (dNe -dN9 ), where N is any nucleotide. 
mRNA, recommended 2.5 to 25 |xg or alternatively, total RNA, 5-50 jig 

25 Radioactive compounds 

[alpha- 3 2 P] dGTP 

Protocol A: Trehalose-Sorbitol enhanced 

To prepare the I s t strand cDNA, put together the following reagents in three different 
30 0.5 ml PGR tubes (A, B, and C) 
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Tube A: in a final volume of 21 .3 pi, add the following: 
mRNA 2.5-25 
or total RNA, 5-50 |ig 

1 st strand primer (2 [xg/|il) 14 |ig (7 |xl) 
5 Total volume: 22 (il 

Heat the mixture (mRNA, primer) at 65° C for 10 min to dissolve the secondary structures of 



mRNA. 




Tube B: in a final volume of 76 |il, add the following: 




5X I s 1 strand buffer 


28.6 |xl 


0.1 M DTI 


11 \il 


dATP, dTTP, dGTP, and 5~methyl-dCTP 10 mM each 


9.3 |il 


4.9 M sorbitol 


55.4 pi 


Saturated trehalose 


23.2 |xl 


RNase FT Superscript II reverse transcriptase (200 U/ jil) 


15.0 ill 


Final volume: 


142.5 ill 



Prepare a cycle (on a thermal cycle) with: 40° C, 4 min; 50° C, 2 min; 56° C, 60 min. 
If total RNA is used as the starting material, prepare a cycle with: 
40° C, 2 min, -0.1° C/sec to 35° C; 50° C, 2 min; 56° C, 60 min. 
20 Alternatively: prime the cDNA with a random primer (dN 9 , N= any nucleotide) at 25° C. 

Tube C: 

1-1.5 nl of [alpha- 3 2 P] dGTP. 

25 For a cold-start operate as follows: 
Quickly mix tubes A and B on ice. 
Transfer in tube C 40 \il of the A+B mixture. 

Tubes A+B and C should be quickly transferred immediately at 40° C of the step 1 of the 
above cycling program to anneal at 40° C four 4 minutes. 
30 Let the reaction proceed following the thermal cycler setting. 
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For a hot-start, operate as follows: 

Transfer the tubes A, B, C on the thermal cycler 

Start the cycling 

When the temperature reaches 42° C, quickly mix tubes A and B. 
5 Transfer in tube C 40 \il of the A+B mixture. 

Let the reaction proceed following the thermal cycler setting. 

Protocol B: GCI-Trehalose-Sorbitol enhanced 

Tube A: in a final volume of 22 |il, add the following: 

10 mRNA 5-25 [ig 

(precipitate with ethanol and re-suspend directly with the primer) 
or total RNA, up to 50 \ig (for the small-scale protocol) 
Purified I s 1 strand cDNA primer (2 |ig/[il)14 ^ig(7 jxl) 
Final volume: 22 |xl 

15 Tube B: add the following: 

2 X GC I (LA Taq) buffer (TaKaRa) 75 \il 

dATP, dTTP, dGTP, and 5-methyl-dCTP, 10 mM each 4 \il 
4.9 M sorbitol 20 pi 

Saturated trehalose (approximately 80%) 10 \il 

20 Superscript II reverse transcriptase (200 U/\il) 1 5 \xl 

ddH 2 O 4 pi 

Final volume: 128 pi 

Tube C: 

alpha- 3 2 P-dGTP 1-5 |U 

25 For the rest of the procedure, follow exactly the point as in the normal reaction 

condition. Prepare (in advance) a thermal cycler with the following cycle: 
42° C, 30 min; 50° C, 10 min; 55° C, 10 min; 4° C, indefinite time. 



Operate as follows: 
30 1) Transfer the tubes A, B ? C on the thermal cycler 

2) Start the cycling 
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3) When the temperature reaches 42° C, quickly mix tubes A and B. 

4) Transfer in tube C 40 \il of the A+B mixture. 

5) Let the reaction proceed following the thermal cycler setting. 
At the end, stop the reaction with EDTA at 10 mM final concentration. 
Then incorporation of [alpha 3 2 P]GTP is measured and the yield of cDNA is 

calculated. Calculation of the amount of cDNA by measuring [alpha 3 2 P]GTP is useful for 
monitoring whether the processes are accurately proceeding or not. 

CTAB precipitation of the first-strand cDNA 



Buffers and solutions 
CTAB solution as described in Example 1 

After measuring the radioactivity, transfer both the "hot" and "cold" I s 1 strand synthesis 
(tube B and C) to a tube and perform CTAB precipitation as follows. 
15 Mix the tube B and C from the first strand; to the mixture add: 
3 [il of 0.5 M EDTA (final concentration of 1 0 mM) 
2 \xl of 1 0 jig/pl Proteinase K. 

Incubate at 45° C or 50° C for at least 15 min, and as long as 1 hour. 

To the 128-142 \xl volume of the first-strand cDNA reaction, add: 
20 32 )il of 5 M Sodium Chloride (RNase free) 

320 jal of CTAB-Urea solution 

Incubate at room temperature for 10 min. 

Centrifuge at 15,000 rpm for 10 min 

Remove supernatant. 
25 Carefully re-suspend with 100 yd of 7M guanidinium chloride 

Add 250 (il of ethanol and leave on ice or -20 to -80° C for 30-60 min 

Centrifuge at 15,000 for 10 min. Remove the supernatant. 

Subsequently, wash the pellet twice with 800 |il of 80% ethanol, Each time, add 80% ethanol 
to the tube and centrifuge for 3 min. at 15,000 rpm. 
30 Re-suspend cDNA in water 46 pi. 
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Cap-trapping, oxidation and biotinylation of the cap 

Buffers and solutions 
1 M sodium acetate buffer, pH 4.5 
5 1M citrate buffer, pH 6.0 
NaI0 4 , solution >100 mM. 
SDS 10% 

Biotinylation buffer: 33 mM Sodium citrate, pH 6.0, and 0.33% SDS. 

10 mM Biotin Hydrazide long arm (MW = 371.51; 3.71 mg/ml = 10 mM) in 
10 citrate/SDS buffer. 

Cap biotinylation: (A) Oxidation of the diol groups of mRNA 

In a final volume of 50 to 55 jxl, add the following: 
The re-suspended cDNA sample 
15 3.3 jil of 1 M sodium acetate buffer, pH 4.5 

A freshly prepared solution of NaI0 4 to a final concentration of 10 mM 
Incubate on ice in the dark for 45 min. 
Finally, precipitate the cDNA: 

20 To simplify the downstream process, add 1 |xl of glycerol 80%. 
Vortex. 

Add 0.5 jxl of 10% SDS, 11 pi of 5 M sodium chloride and 61 \il of isopropanol. 

Incubate at -20 or -80° C for 30 min in the dark. 

Centrifuge for 15 min at 15,000 rpm. 
25 Remove supernatant. 

Add 500 |il of 80% ethanol 

Centrifuge at 15,000 rpm for 2-3 min. 

Discard the supernatant 

Repeat steps 12-13 
30 Re-suspend the cDNA in 50 \il of water. 

Biotinylation: (B) Derivatization of the oxidized diol groups 
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To the cDNA (50 pi), add 160 pi of the dissolved biotin hydrazide long arm in the reaction 
buffer. Perform the reaction in 21 0 pi (final volume). 
Incubate overnight (10-16 hours) at room temperature (22-26° C). 
Subsequently, to precipitate the biotinylated cDNA, add: 
5 75 |il 1 M Sodium citrate, pH 6,1 
5 |il of 5 M Sodium chloride 
750 [il of absolute ethanol 

Incubate on ice for 1 hour or at -80 or -20° C for 30 min or longer. 
Centrifuge the sample at 15,000 rpm for 10 min 
10 Wash the precipitate twice with 70% or 80% ethanol and centrifuge. 

Discard the supernatant and repeat the wash, dissolve the cDNA in 175 pi of TE (1 raM Tris, 
pH 7.5, 0.1 mM EDTA). 

Cap-trapping and releasing the 5' ends of cDNA enzymes and buffers 
RNase ONE (Pro mega) and its reaction buffer 

15 

To the cDNA sample add, in a final volume of 200 pi: 
20 pi of RNase I buffer (Promega). 

1 units of RNase I (Promega, 5 or 10 U/pl) per each 1 pg of starting mRNA or total RNA (in 

case of small scale protocol) used for first-strand cDNA synthesis. 
20 Incubate at 37° C for 30 min. 

To stop the reaction, put the sample on ice and add 

4 pi 10%SDSand 

3 pi of 1 0 pg/pl Proteinase K. 

Incubate at 45° C for 15 min. 
25 Extract once with 1:1 Tris-equilibrated phenol :chloroform, then load the aqueous phase into 

Microcon -100. 

Perform a back extraction with water and load again into the Microcon-Centricon 100 filter. 

Perform one round of Microcon separation 

8-b) Dissolve completely the pellet with 20 pi of 0.1 x TE 

30 

Magnetic beads blocking 
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Materials 

Streptavidin-coated MPG (CPG inc., New Jersey) 

Buffers and solutions 
5 Binding buffer: 4.5 M NaCl, 50 mM EDTA, pH 8.0 



Special equipment 

A magnetic stand to hold 1 .5 ml tubes is required. 

10 To further minimize the non-specific binding of nucleic acids, magnetic beads are pre- 
incubated with DNA-free tRNA (lOmg/ml). 

For each preparation, pre-incubate 500 \il of magnetic beads (per 25 \ig of starting mRNA) 
with 1 00 [xg of tRNA. 

Incubate on ice for 30 min with occasional mixing. 
15 Separate the beads with a magnetic stand (for 3 min) and remove the supernatant. 
Wash for 3 times with 500 \il of binding buffer 



5 '-ends cDNA capture and release 



20 To capture the full-length cDNA, mix the RNasel-treated cDNA and wash beads as follows: 

1) Re-suspend the beads in 500 \il of wash/binding buffer. 

2) Transfer 3 50 |il of the beads into the tube containing the biotinylated first- 
strand cDNA. 

3) After mixing gently rotate the tube for 10 min at 50 ° C, 

25 4) Transfer 1 50 [il of the beads into the tube containing the biotinylated first- 

strand cDNA and 350 \xl of beads. 

5) After mixing gently rotate the tube for 20 min at 50 ° C. 
Separate the beads from the supernatant on a magnetic stand. 
Washing the beads 

30 Gently wash the beads with 0.5 ml of the indicated buffer to remove the nonspecifically 
absorbed cDNAs. 
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2 x with washing/binding solution. 

1 x with 03 M NaCl/ ImM EDTA 

2 x with 04% SDS/ 0.5 M NaOAc/ 20 mM Tris-HCl pH 8.5/ ImM EDTA. 
2 x with 0.5 M NaOAc/ 10 mM Tris-HCl pH 8.5/ ImM EDTA. 

5 Alkali release (see below) 

Alkali full-length cDNA release from beads 
Add 100 yd of 50 mM NaOH, 5 mM EDTA. 
Briefly stir and incubate 5 min at RT with occasional mixing. 
Separate the magnetic beads and transfer the eluted cDNA on ice. 
10 Repeat the elution cycle with 1 00 \il of 5 0 mM NaOH, 5 mM EDTA, two more times until 
most of the cDNA, 80-90% as measured by monitoring the radioactivity, can be recovered 
from the beads. 

Adding a 5 '-end primable site to the cDNA 
RNase step 
15 Enzymes and buffers 

- RNase ONE™ and its buffer (Promega) 

Add 50 |xl of 1 M Tris-HCl, pH 7.0 in tubes on ice and mix quickly. 
Add 1 [d of RNase I (1 0U/|xl) and mix quickly. 
Incubate at 37 ° C for 10 min. 
20 To remove the RNasel, treat the cDNA with Proteinase K and phenol/chloroform extraction 
including back extraction. 

Add 3 |xg of glycogen. Treat the cDNA with one cycle of Microcon-100. 

Fractionation of cDNA before adding a primable site 

Materials 

25 Amersham-Pharmacia S-400 spun kit or alternative kits 
Buffers and solutions 

Column buffer: 10 mM Tris, pH 8*0, 1 mM EDTA, 0.1 % SDS, and 100 mM NaCl 
Column buffer without SDS: 10 mM Tris, pH 8.0, 1 mM EDTA and 100 mM NaCl 

30 S-400 spun column chromatography 
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Detailed protocols are described in the kits. This is the running protocol of S-400 spun 
columns. 

Shake the column. 

Brake the seal and transfer in a 2 ml tube. 
5 Centrifuge at 3,000 rpm 1 min (+ 4°C). 
Add the cDNA (< 20 |al volume). 
After cDNA, add 80 jxl of water. 
Centrifuge 2 min at 3000 rpm. 

Concentrate by Microcon 100 or precipitate with isopropanol. Recovery should exceed 80%. 

10 

SSLLM 
Materials 

15 S-300 spun column chromatography kit (Amersham-Pharmacia) 
Buffers and solutions 

Column buffer: lOmM TrisHCl pH 8.0, ImM EDTA, 0.1% SDS, lOOmM NaCl. 
Enzymes and buffers 
Takara DNA Ligase KIT II. 

20 Nucleic acids and oligonucleotides 

In the Example given here, the recognition sites for the restriction enzymes Bgl II, Gsu I and 
Mme I are introduced, however, the invention is not dependent or limited to the use of those 
restriction enzymes and their recognition sites. In particular, Bgl II (recognition site: 
AGATCT) can be replaced by any endonuclease suitable for cloning. Other example for such 

25 enzyme could include Asc I (recognition site: GGCGCGCC) or Xba I (recognition site: 
TCTAGA). 

Synthesize the following oligonucleotides containing the Gsul restriction site. 
Oligonucleotide Bg-Gsu-GN5: 
30 5 ' -Biotin-AG AG AG AGAACTAGGCTTAATAGGTGACT AGATCTGGAGGNNNNN-3 ' 

(SEQIDNO:ll); 
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Oligonucleotide Bg-Gsu-N6: 

5 '-Biotin-AGAGAGAGAACTAGGCTTAATAGGTGACTAGATCTGGAGNNNNNN-3 ' 

(SEQIDNO:12); 
Oligonucleotide Bg-Gsu-down: 
5 5 T CTGG AGATCTAGTCACCTATTAAGCCTAGTTCT 3' (SEQ ID NO: 

13). 

Synthesize the following oligonucleotides containing the Mme I restriction site. 
Oligonucleotide Bg-Mme-GN5 : 
10 5 5 -Biotin- AG AG AG AGAACTAGGCXTAATAGGTGACTAGATCTTCCR ACGNNNNN- 

3 5 (SEQ ID NO: 14); 
Oligonucleotide Bg-Mme-N6: 

S'-Biotin-AGAGAGAGAACTAGGCTTAATAGGTGACTAGATCTTCCRACNNNNNN- 
3' (SEQ ID NO: 15); Oligonucleotide Bg-Mme-down: 
15 5T-GTYGGAGATCTAGTCACCTATTAAGCCTAGTTCTCTCTCT-NH 2 3' (SEQ ID 
NO: 16). 

Where R stands for G or A and Y stands for C or T. 

P means that the oligonucleotide must be 5'phosphorylated and NH 2 indicates that an amino- 
group is added to avoid non-specific ligation and possible hairpin priming. 
20 Oligonucleotides should be purified by acrylamide gel electrophoresis following standard 

techniques as the first-strand cDNA primer with 10% acrylamide electrophoresis (Sambrook 
and Russel, 2001). Oligonulceo tides should be extracted with phenol/chloroform, chloroform 
and precipitation with 2 volumes of ethanol as for the first-strand cDNA primer. 

25 Preparation of the linkers. 

After OD checking and mixing Bg-Gsu-GN5 ? Bg-Gsu-N6 and "down" oligonucleotides at 
ratio 4:1 :5, at least 2 |ig/|xl of DNA; add NaCl at 100 mM final concentration. The 
oligonulceotides are annealed at 65° C for 5min, 45° C for 5min, 37° C for lOmin, 25° C for 
30 lOmin. 



52 



WO 03/106672 



PCT/JP03/07514 



Ligation of the first-strand cDNA 

Use 2 |xg of linker mixture for up to 1 |4,g single-strand cDNA. Mix linkers and cDNA (final 
volume: 5 jil) 

Heat at 65° C for 5min to melt secondary structures of single-strand cDNA 
Transfer the linker and cDNA mix on ice. 

Add 5 ixl of the solution H from the TAKARA DNA ligation Kit. 

Add 1 0 |il of solution I of the kit. 

Incubate at 10° C overnight (at least >10 hours). 

At the end of the ligation reaction, stop the reaction by adding of 0.5 MEDTA, 1 |il of 
10% SDS, 1 |il of 1 0 mg/ml Proteinase K, 10 \il of water, and incubate at 45° C for 15 min. 
Treat with phenol/chloroform, chloroform and back extract (see appendix) with 60 jxl of 
column buffer 

After the ligation, remove the excess linker with S-300 spin column chromatography 

1) Shake the column several times and then let it stand upright 

2) Remove the upper cap, then the bottom one. 

3) Drain the buffer of the column. Apply 2 ml of the column buffer and drain twice by 
gravity. 

Put the column into a 15 ml centrifuge tube, then centrifuge at 400 x g for 2 min in a swing- 
out rotor at room temperature. 

Apply 100 jxl of buffer to the column, then centrifuge at 400 x g for 2 min. Check the eluted 
volume. If it is different from the input (1 00 repeat this step until the eluted volume is the 
same as the added one. 

Set a 1.5 ml tube, after cutting off the cap, into the 15 ml centrifuge tube, and then apply the 
sample into the column. Centrifuge at 400 x g for 2 min. 

Collect the eluted fraction in a separate tube. Apply to the column 50jil of buffer, repeat the 

centrifugation and collect the fraction in a separate tube. 

Repeat step 6 for 3 to 5 more times; keep the eluted fractions separate. 

Collected fractions should be counted in a scintillation counter. Usually mix the first 2-3 

fractions (80% of cpm of cDNA). 
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Add NaCl to a final concentration of 0.2 M, precipitated the cDNA by adding equivalent of 
isopropanoL 

After precipitation and washing twice with 80% cold ethanol, re-suspend with water. 
Second-strand cDNA 

Setting the 2nd strand cDNA program on the thermal cycler as follows: 
Step 1 5 min at 65 °C 

Step 2 30 min at 68 °C 

Step 3 72 °C for 10 min 

Step 4 +4°C 

Procedure for the second-strand cDNA 

Second strand steps, mix in a test tube: 
ThecDNA 

6 p.1 of LA-Taq polymerase buffer (Tatar a) 
6 |xl of 2. 5 mM (each) dNTF s (Takara) 

0.5 |il of [alpha- 3 2 P] dGTP (optional to follow the incorporation) 

After starting the 2nd strand program, put the tube on the thermal cycler. 

Add to tube 3 |xl of 5 U/|il of LA Polymerase or alternative thermostabe polymerase cocktails, 

when the samples are at 65°C, during the first step. 

Mix quickly but thoroughly 

At the end of the cycle of the thermal cycler, stop the reaction by addying 10 mM EDTA 
(final concentration) and clean up the reaction by Proteinase K treatment, Phenol-chloroform 
extraction and ethanol precipitation (see Sambrook and Russel, 2001, Molecular Cloning, 
CSHL press, NY). 

Cleavage of cDNA 

The cDNA should then be cleaved with the Class lis restriction enzyme like Gsu I given in 
this Example. 
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Buffer (10X) (MBI Fermentas) 
GsuI(lU/|xl) (use 5U/^ig DNA) 
ddH 2 0 



10 ^il 
X [il 



Final volume 



100 |il 



5 Where the Y and X vary depending on the quantity of cDNA 

1) Incubate at 37°C for 1 hour. 

2) Added 0.5M EDTA 2 (jtL 

3) Incubated at 65°C for 15 min. to inactivate the enzyme 
Prepare the magnetic beads 

10 Prepare the appropriate quantity of CPG-MPG (Magnetic porous glass beads). The same 
considerations made for the cap-trapper step are valid at this point. 
Prepare 200 \xl of GPG- beads. 
Add 5 jig of tRNA (20 mg/ml). 

Incubate at RT for 10-20 min or on ice for 30-60 min, with occasional shaking 
15 Transfer the beads on a magnetic stand for 3 minutes and remove the aqueous phase. 

Wash 3 times with: 1M NaCl, 10 mM EDTA use at least a volume equivalent to the starting 
volume of beads. 

Re-suspend beads in 1M NaCl, 10 mM EDTA equivalent to the starting volume of beads. 

20 Release of cDNA tags 

Mixed washed beads and Gsul cut sample. 
Incubate at RT for 15 min with occasional gentle mixing 
Let it stand on magnetic rack for 3 min. 
25 Recover the supernatant. 

Rinse 4X with 500 pi of 1XB&W buffer (binding and washing buffer= 5 mM Tris, pH 7.5, 
0,5 mM EDTA, and 1 M NaCl) containing IX BSA (bovine serum albumin) wash. 
Wash 2X with 200 pi of IX ligase buffer (NEB). 

30 Ligating linkers to bound cDNA: II linker ligation. 
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In this Example a linker with a recognition site for the restriction enzyme Eco RI is used. 
However, the invention is not dependent or limited to the use of Eco RI in the second linker. 
Any other restriction enzyme and its recognition site can be used depending on their 
convenience for cloning the concatemers. 

5 

Oligonucleotides to be synthesized : 

5 5 -GAGAGAGAGACTTTAGGTGACACTATAGAAGAGTCCTGAGAATTCNN-3' (SEQ 
ID NO: 17) 

10 5 ' -P-G AATTCTCAGG ACTCTTCT AT AGT ' (SEQ ID 

NO: 18) 

The oligonucleotides are purified and annealed as described for the Linker 1. 

15 LoTE (1 mM Tris, pH 7.5, and 0. 1 mM EDTA) 20 \il suspended and add linker H (0.4 jig/iil) 
Heat the tube at 65 °C for 5min, then let sit at room temperature for 15min. 
Add TaKaRa ligation kit II solution II 25 pi and solution I 50pl. 
Incubated at 16°C overnight. 

After ligation, wash 4 times with 500 \xl IX B&W buffer containing IX BSA. 
20 Wash once with 200 \il IX B&W buffer and twice with 200 \il lXBglH buffer containing IX 
BSA. 

Release of cDNA tags using the Tagging Enzyme 

25 Add to the sample the following 

LoTE Xjxl 

10X buffer 10 |il 

Bgl II Y [i\ 

Make up the volume to a total of 100 |xl. 
30 1) Incubate at 37°C for 1 hour, gently mixing intermittently. 
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2) Place on magnet, collect supernatant into new tube* The supernatant contains the 
released 5' end fragments. 

3) Raise volume to 200 jil with LoTE. 

To 200 |il of sample (the 5' ends, tagged with linkers) add: 
5 133 |il7.5MNH40ac 
3 jil ljxg/jil glycogen 
340 [il Isopropanol 

Incubate at -20 or -80°C for at least 30 min. 

Spin for 20min at 4°C at 15,000 rpm in a micro-centrifuge. Remove the supernatant. Wash 
10 the pellet twice with 80% or 70% ethanol. Centrifuge for 3 min at 15,000 rpm and removed 
the ethanol wash. At the end, re-suspend in 10 |xl LoTE. 



Ligating tags to form di-tags 



15 The 5' ends of cDNAs are ligated to form di-tags. 

1) Add the TaKaRa ligation Kit II solution II 10 jxl and solution 1 20 jal. 

2) Incubate overnight 16°C. 

3) Added 1 0 |il of ddH 2 O, 1 |il of 0. 5M EDTA, \i\ of 1 0% SDS 1 and 1 yd of 1 0 |^g/|al 
Proteinase K. 

20 4) Incubate at 45°C for 15min. 

5) Extract once with 1:1 Tris-equilibrated phenol :chloroform aqueous phase. After 
phenol-chloroform and chloroform, and back extraction. 

6) Removal the smallest cDNA fragment with a G-50 spun-column (Size exclusion). 

7) precipitate with isopropanol by adding 5 \ig of glycogen as carrier. 
25 1 00 jxl sample 

67 fil 7.5MNH4OAC 

5 [il glycogen 

1 80 |xl Isopropanol 

8) Spin for 20 min at 4°C 

30 9) Wash twice with 80% or 70% ethanol, centrifuge and remove the ethanol. 
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Cleavage of cDNA with anchoring enzyme 



1) Re-suspend the sample in 5 |al of LoTE. Add then in order: 
LoTE X [il 

5 10X EcoRI restriction buffer 5 [il 

EcoRI Y (ill (use 20 Units of EcoRI) 

Bring up the volume to a total of 50 (il. 

2) Incubate at 37°C for 1 hour. 

3 ) Add 1 \il of 0. 5M EDTA, 1 yd of 1 0% SDS 1 and 1 |il of 1 0 {ig/yl Proteinase K 1 0%. 
10 4) Incubate at 45 °C for 15min. 

5) Extract once with 1:1 Tris-equilibrated phenol tchloroform aqueous phase. After 
phenol-chloroform and chloroform, and back extraction 

6) precipitate with isopropanol by adding 5 jig of glycogen as carrier. 
1 00 [il sample 

15 67^17.5MNH40Ac 
5 [il glycogen 
1 80 jil Isopropanol 

8) Spin for 20 min at 4°C. 

9) Wash twice with 80% or 70% ethanol, centrifuge and removed the ethanol wash each 
20 time. 



Ligation of di-tags to form concatemers 

1) Resuspended LoTE 5 \il. 

25 2) Added TaKaRa ligation kit II solution II 5 [il and solution II 1 0 \il, 

3) Incubate 1.5 hours at 16°C. 

4) Added 0.5MEDTA 1 (il, 10% SDS 1 ^1, 10 [ig/|xl Proteinase K 1 ^il. 

5) Incubate at 45°C for 1 5min. 

6) Extract once with 1:1 Tris-equilibrated phenol :chloroform aqueous phase. After 
30 phenol-chloroform and chloroform, and back extraction. 

7) precipitate with isopropanol by adding 5 jig of glycogen as carrier. 
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1 00 \il sample 
67 (il7.5MNH40Ac 
5 |il glycogen 
180 |il Isopropanol 
5 8) Spin for 20min at 4°C. 

9) Wash twice with 80% or 70% ethanol, centrifuge and removed. 
Resolved 5 (il ddH 2 O. 

The above-obtained concatemers are to be further ligated into a cloning vector such as 
10 pBlueascript II KS+ (Stratagene). A large variety of cloning vectors are known in the filed, 
which can be use for invention. 
Standard Ligation: 

Mix a three time excess of concatemer DNA and 100 ng of an appropriate vector linearized 
with Eco RI in a volume of 5 pi. Then mix 5 \il of Solution I of DNA Ligation Kit Ver. 2 
15 (Takara) to the insert/vector mixture. Incubate the tube at 16° C for 12-16 h. 

Transformation : 

To remove salt from the ligation solution, precipitate DNA after the addition of 2 \ig of 
Glycogen (Roche), 20mM Sodium Chloride and 80% ethanol. The DNA pellet is washed 

20 twice with 150 |il of 80% of ethanol, and the pellet is then dissolved in 10 jjlI of water. Using 
1 |xl of desalted ligation solution, ElectroMAX™ DH10B™ Cells (Invitrogen) are 
transformed using Cell-Porator or alike (Biometra) according to the transformation 
procedures described in the manufacturer's manual Transformed bacteria are plated on a 
selective medium and grown overnight. Positive clones are to be isolated from those plates 

25 for further characterization of the concatemers. 

Example 3: Alternative preparation of 5 7 end specific tags involving the formation of di-tags 

The invention can be performed with other linkers and restrictions enzymes than specified in 
30 the Examples 1 and 2. In one such embodiment, the invention was performed with the 

following changes, where the same protocols were used as specified in the aforementioned 
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Example 1 if not otherwise noted: RNA samples were prepared as described above and 
forwarded to first-strand cDNA synthesis. The resulting cDNA-RNA hybrids were 
fractionated by the Cap-Trapper approach, and cDNA transcript comprising sequences 
homologous to the 5' end of mRNA were isolated. Single-stranded cDNA was then ligated to 
5 a different first linker comprised of the following oligonucleotides: 

Upper Strand: 

Bio-5'-agagagagagcttagatgagagtgaCTCGAGCCTAGGrccaacgNNNNN-3 ? (SEQ ID NO: 19) 
Bio-5'-agagagagagcttagatgagagtgaCTCGAGCCTAGG^ccaacNNNNNN-3' (SEQ ID NO: 

10 20) 

Lower Strand: 

Pi-5'-gttggacctaggctcgagtcactctcatctaagctctctctct-NH2-3 5 (SEQ ID NO: 21) 

The new linker provided recognition sites for the restriction enzymes Xho I (indicated in 
15 capital and underlined), Xma JI (indicated in capital), and the tagging enzyme Mme I 
(indicated in italic). 

After the ligation of the linker to the cDNA the second-strand cDNA was prepared, and the 
double-stranded DNA was cleaved with Mme I to provide 5' end specific tags. Those tags 
20 were then purified on streptavidin-coated magnetic beads (Dynabeads) before addition of the 
second linker. Again the second linker had a distinct Y-shaped structure compared to the 
linker used in Examples 1 and 2 as indicated below (SEQ ID NOS: 22 and 23): 

atcgaaatcccgatctaggctagcg-NH2 

25 P-5 ' -gaattctacgcctctcg 
3 '-NNcttaagatgcggagagc 

gtgaatcgagtttaaggctagcatc-5 5 

This linker was designed to have an Eco RI restriction site (indicated in underlined), and two 
30 single-stranded overhangs to allow for strand-specific amplifications. Note that two 
restriction enzymes with distinct cloning sites were used at this point. 
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After the ligation of the second linker to the 5' end tag the resulting DNA fragment 
comprising the two linkers and one tag was amplified by PGR using the following primers: 

XM_cDNAJPCR: 

5 '-ttagatgagagtgactcgagcctag-3 5 (SEQ ID NO : 24) 
EcoRI_Y2down_PCR: 

5'~ctacgatcggaatttgagctaagtg-3'(SEQ ID NO: 25) 

The PGR product was amplified directly on the streptavidin-coated beads to which the DNA 
templates were bond to by the means of the biotin-streptavidin interaction. As the PGR 
primers did not have any biotin moistures, the PGR products could be separated directly from 
the beads by applying a magnetic force and forwarded to further purification in a 12% 
polyacrylamid gel. 

The purified PCR products were subsequently cleaved by Xma JI ? purified in a 12% 
polyacrylamid gel, and self-ligated to form dimeric tags comprising two 5 5 end specific tags 
and overhangs derived from the second linker at both ends. These dimerization products were 
further cleaved with Eco RI, and again purified in a 12% polyacrylamid gel before being 
concatemerized in a ligation reaction. This final gel purification was essential to separate the 
dimeric tags from the DNA fragments cleaved off during the digestion with Eco RI. The 
ligation products were fractionated in a 6% polyacrylamid gel, and DNA fragments in the 
range of 300 to 600 bp and 600 to 4,000 bp were cut out for DNA isolation. 

DNA fragments isolated from both fractions were cloned into the Eco RI site of the vector 
pZerol.O (Invitrogen), and transformed bacteria were selected on LB medium containing 50 
jxg/ml Zeocin (Invitrogen). Positive clones thereof were isolated and further characterized as 
described in the Examples below. 

Example 4: Sequencing of 5 '-end sequence tags 
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After the titer check, bacterial clones were collected by commercially available picking 
machines (Q-bot and Q-pix; Genetics) and transferred to 384-microwell plates* Transformed 
E. coli clones holding vector DNA were divided from 384-microwell plates and grown in 
5 four 96-deepwell plates. After overnight growth, plasmids were extracted either manually 
(Itoh M. et al. 1997, Nucleic Acids Res 25:1315-1316) or automatically (Itoh M. et al. 1999, 
Genome Res. 9:463-470). Sequences were typically run on a RISA sequencing unit 
(Shimadzu, JAPAN) or a Perkin Elmer-Applied Biosystems ABI 377 in accordance with 
standard sequencing methodologies such as described by Shibata K. et al. (Genome Res. 
10 2000 10, 1757-71). Sequencing of concatemers was also performed using primers nested in 
the flanking regions of the cloning vector and a BigDye Terminator Cycle Sequencing Ready 
Reaction Kit v2.0 (Applied Biosystems) and an ABI3700 (Applied Biosystems) sequencer 
according to the manufacture's product descriptions. Some concatemers were sequenced 
from both ends to cover their entire sequence. 

15 

Standard primers used for vectors Bluescript and pZerol.0: 

M13 Reverse primer: 5 9 -C AGGAAAC AGCTATGAC (SEQ ID NO: 26) 

M13 (-20) Forward primer: 5 '-GTAAAACGACGGCCAG (SEQ ID NO: 27) 

20 Example 5: Identification of 5 '-end sequence tags 

The sequences obtained form concatemers are characterized by the structure of the dimmeric 
tags and the flanking linker sites as presented in Figure 6. Defined regions holding the 
recognition sites for the restriction enzymes used during the cloning steps flank each 5' end 
25 specific sequence tag. Therefore the 5' end specific sequence tags can be identified by a 
manual sequence analysis or by an automated process using an appropriate computer 
program. Individual 5' end specific sequence tags can be stored in a computer file or a 
database system. 

30 Initial sequence reads were analyzed by computational means. The individual steps involved 
in the sequence analysis are described below showing the analysis of one read: 
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0) Original sequence: 
>zzb21305i03t3.scf 596 0 596 SCF 

TCGTTAACTATTAGGCGAATTGGGCCCTCTAGGTCGACGAGTTCTCAGCAGAGCC 

5 GCCGTCTAGAGCCCCGCCCTCCCGGGCCACCGTCGGACCTAGAATAGTTACTCGA 

GGTCTCTCGTCGGACCTAGAGTTTTTCGTATGTTTGTCATCGTCGGACCTAGGTCC 

GACGGTCCATTCCTGAGAGTCTCTCTAGGTCCGACGAGAGAGAGAGGATCCTTCT 

GTCTAGACCCTGACGCCGGAACCGCACCGTCGGACCTAGGTCCGACGGAAAAGC 

AGCTTCCTCCACTCTAGGTCCGACGGTGTGTGTGTGTGTGCGTGTTCTAGAGACT 

10 GGTTCAGATCAAAAGTCGTCGGACCTAGGTCCGACGGGGCTGGTGAGATGGCTC 

AGTCTAGATGCATGCTCGAGCGGCCGCCAGTGTGATGGATATCTGCCNAATNCC 

AGCACACCGGCGGGCGCNACCAGTGGATCCGAGCCCGGTACCAAGCTTGATGCA 

TACGTCGAGTATCCTATACTGTCACCTAAATAGCTTGGGGTAATCATGGTCATAG 

CTGTCTCCTGTGTGAAATTGTTATCCGCTCAAAATTCCCAACAACATAG 
15 (SEQIDNO:28) 

1) pZErO-1 vector portions of sequences were masked using program 
called "cross_match". X stands for "masked". 
>zzb21305i03t3.scf 596 0 596 SCF 

20 TCGTTAXXXXXXXXXXXXXXXXXXXXXXXXXXGTCGACGAGTTCTCAGCAGAG 
CCGCCGTCTAGAGCCCCGCCCTCCCGGGCCACCGTCGGACCTAGAATAGTTACTC 
GAGGTCrcrCGTCGGACCrAGAGTTTTTCGTATGTTTGTCATCGTCGGACCTAGG 
TCCGACGGTCCATTCCTGAGAGTCTCTCTAGGTCCGACGAGAGAGAGAGGATCC 
TTCTGTCTAGACCCTGACGCCGGAACCGCACCGTCGGACCTAGGTCCGACGGAA 

25 AAGCAGCTTCCTCCACTCTAGGTCCGACGGTGTGTGTGTGTGTGCGTGTTCTAGA 

GACTGGTTCAGATCAAAAGTCGTCGGACCTAGGTCCGACGGGGCTGGTGAGATG 

G CTCAGXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 

XXX^CKXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 

30 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 
XXXXXXXXXXG 
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2) Look for linker sequences using "cross_match" 

Linker sequence according to Example 1: "NCTAGGTCCGAC" (SEQ ID NO: 29) 
Linker sequence according to Example 3: "NGTTGGACCTAGGTCCAACN " (SEQ ID NO: 
5 30) 

Linkers found using "cross_match" (excerpts from output): 

linkerl TCTAGGTCCGACG 86-98 13-1 C (SEQ ID NO: 31) 

linkei2 TCTAGGTCCGACG 118-130 13-1 C 
10 linker3 CCTAGGTCCGACG 151-163 13-1 C (SEQ ID NO: 32) 

linker4 CCTAGGTCCGACG 158-470 1-13 

linker5 TCTAGGTCCGACG 190-202 1-13 

linker6 CCTAGGTCCGACG 249-261 13-1 C 

linker7 CCTAGGTCCGACG 256-268 1-13 
15 linker8 TCTAGGTCCGACG 288-300 1-13 

linker9 CCTAGGTCCGACG 347-359 13-1 C 

linkerlO CCTAGGTCCGACG 354-366 1-13 

3) Using output from "cross_match". Tag extraction program identifies location and 
20 direction of linkers in sequences. 

means linker in reverse direction 

+++++++++++++ means linker in positive direction 

++++++++++ dimeric linker (reverse and forward direction) 

25 >zzb21305i03t3 596 

TCGTTAXXXXXXXXXXXXXXXXXXXXXXXXXXGTCGACGAGTTCTCAGCA 

GAGCCGCCGTCTAGAGCCCCGCCCTCCCGGGCCAC- -AT 

AGTTACTCGAGGTCTCT GTTTTTCGTATGTTTGTCAT 

++++++++++GTCCATTCCTGAGAGTCTC+++++++++++ 

30 ++AGAGAGAGAGGATCCTTCTGTCTAGACCCTGACGCCGGAACCGCAC-- 

++++++++++GAAAAGCAGCTTCCTCCAC+++++++++++++ 
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GTGTGTGTGTGTGTGCGTGTTCTAGAGACTGGTTCAGATCAAAAGT — 
++++++++++GGGCTGGTGAGATGGCTCAGXXXXXXXXXXXXXX 

xxxxxxxxxxxxxxxxxxxxxx^ 

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 
5 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXG 

4) Script looked for restriction enzyme site at possible locations. For example, a gap between 
two linkers (or linker-vector) that are long enough for two tags. 

10 "TCTAGA" for monomer 
"GAATTC" for dimer 
It was masked with •■******" 

>zzb21305i03t3 596 • 
15 TCGTTAXXXXXXXXXXXXXXXXXXXXXXXXXXGTCGACGAGTTCTCAGCA 

G AGCCGCCG * * * * * *GCCCCGCCCTCCCGGGCCAC AT 

AG1TACTCGAGGTCTCT GTTTTI CGT ATGTTTGTCAT 

++++++++++GTCCATTCCTGAGAGTCTC+++++++++++ 

++ AGAGAGAGAGGATCCTTCTG* ***** CCCTGACGCCGGAACCGCAC- 

20 ++++++++++GAAAAGCAGCTTCCTCCAC+++++++++++++ 

GTGTGTGTGTGTGTGCGTGT ****** GACTGGTTCAGATCAAAAGT — 
++++++++++GGGCTGGTGAGATGGCTCAGXXXXXXXXXXXXXX 

xxxxxxxxxxxxxxxxxxxxxxxx^ 

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx^ 
25 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx^ 

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXG 

5) Script extracted tags from the sequences that were not masked from vector, linker, 
restriction enzyme site. Tags also must be a) at right size (19-20 bp) and b) located right next 

30 to linker with right direction (+++++++tag or tag ) 
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tagl 20 GTGGCCCGGGAGGGCGGGGC (SEQ ID NO: 33) 
tag2 19 AGAGACCTCGAGTAACTAT (SEQ ID NO: 34) 
tag3 20 ATGACAAACATACGAAAAAC (SEQ ID NO: 35) 
tag4 19 GTCCATTCCTGAGAGTCTC (SEQ ID NO : 3 6) 
5 tag5 20 AGAGAGAGAGGATCCTTCTG (SEQ ID NO : 37) 
tag6 20 GTGCGGTTCCGGCGTCAGGG (SEQ ID NO: 38) 
tag7 19 GAAAAGCAGCTTCCTCCAC (SEQ ID NO: 39) 
tag8 20 GTGTGTGTGTGTGTGCGTGT (SEQ ID NO: 40) 
tag9 20 ACTTTTGATCTGAACCAGTC (SEQ ID NO: 41) 
10 taglO 20 GGGCTGGTGAGATGGCTCAG (SEQ ID NO: 42) 

- The following definitions were used to categorize the tags: 
"Good tag" meant: 

15 

1) Not a vector sequence (Step 1) 

2) Not a linker sequence (Step 2) 

3) Not a restriction site (Step 4) 

4) Next to linker with correct direction (Step 5) 
20 5) At right sizes (19-20 bp). (Step 5) 

In future, quality value will play a role too. 

Program outputs linker information, masked sequences, tag sequences. 

25 - "junk" meant: 

When program/script could not recognize restriction enzyme site or linker sequences 
(because of bad quality value), sequences will be considered as junk. Also vector sequences 
that were not masked properly (because of bad quality value) were considered as junk too. 

30 
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Below the output of a computer based analysis of a sequencing read is given. The sequence 
read was obtained from a clones prepared according to the protocol given in Example 1. Note 
that XmaJI and Xba I create the same overhang after digestion, and therefore in this example 
sequence many linker sites are derived from recombined XmaJI/Xbal sides. The program 
5 identified linker sites as indicated by symbols and highlighted the 5' end specific sequence 
tags as described above. Note in the list for the 5' end specific tags given below, the program 
automatically remove the first base as this position is primed for artifacts due to the template 
free site activity of the reverse transcriptase. 

10 >zzb21106i09t3.scf 569 (monomer) 

CATTAGGGGATTGGGCCC+++++++++++++GTACCTCCTCGCATCCCGC 

***** *ACCTTCGACACGCACACCAC ++++++++++ ATGG 

ACCGAGGGCCCCAGCC+++++++++++++CGGATCGGGTGGGTCGGAC** 

* * * * ACGAACTGCTGCGACCTCT CACAGCGCCGGCTC 

15 CGGAGA -CTCGGAGCCTGCAAAGTCT « 

-TCCGGCGCTGCGGCAGCTCC GCGACCAGGTCCGACG 

GTGT GACTCTGGGCGAGAACGTCT +++ 

+++++++ GCCGTTCCTTGCTTGCTGGA* * * * * * CTGAGCT AAATCCCC AA 

CCC ++++++++++GAGTAACTATAACGGTCCT** * * * *GC 

20 GAGCTCCAGGCGGAATC -ACCCGGGGGGCGGGACTAAC 

CGTCGGAC+++++++++++++AGGGACCGCTGCGGTCCGXXXXXXXXXXX 

XXXXXXXXXXXXXXXXXXN 

linkerl 19 31 

linker2 77 89 C 
25 linker3 84 96 

linker4 117 129 

linker5 174 186 C 

linker6 207 219 C 

linkef7 239 251 C 
30 linker8 272 284 C 

linker9 305 317 C 
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linkerlO 338 350 C 
linkerll 345 357 
linkerl2 404 416 C 
linkerl3 411 423 
5 linkerl4 468 480 C 
linkerl5 509 521 

tagl F 19 GTACCTCCTCGCATCCCGC (SEQ ID NO: 43) 

tag2 R 20 GTGGTGTGCGTGTCGAAGGT (SEQ ID NO: 44) 

tag3 F 20 ATGGACCGAGGGCCCCAGCC (SEQ ID NO: 45) 
10 tag4 F 19 CGGATCGGGTGGGTCGGAC (SEQ ID NO: 46) 

tag5 R 19 AGAGGTCGCAGCAGTTCGT (SEQ ID NO: 47) 

tag6 R 20 TCTCCGGAGCCGGCGCTGTG (SEQ ID NO: 48) 

tag7 R 19 AGACTTTGCAGGCTCCGAG (SEQ ID NO: 49) 

tag8 R 20 GGAGCTGCGGCAGCGCCGGA (SEQ ID NO: 50) 
15 tag9 R 20 ACACCGTCGGACCTGGTCGC (SEQ ID NO: 51) 

taglO R 20 AGACGTTCTCGCCCAGAGTC (SEQ ID NO: 52) 

tagll F 20 GCCGTTCCTTGCTTGCTGGA (SEQ ID NO: 53) 

tagl2 R 20 GGGTTGGGGATTTAGCTCAG (SEQ ID NO: 54) 

tagl3 F 19 GAGTAACTATAACGGTCCT (SEQ ID NO: 55) 
20 tagl4 R 19 GATTCCGCCTGGAGCTCGC (SEQ ID NO: 56) 

tagl5 F 18 AGGGACCGCTGCGGTCCG (SEQ ID NO: 57) 

zzb21106i09t3 junk 18 CATTAGGGGATTGGGCCC (SEQ ID NO: 58) 

zzb21106i09t3 junk 28 ACCCGGGGGGCGGGACTAACCGTCGGAC (SEQ ID NO: 59) 

zzb21106i09t3 junk 1 N 

25 

Similar to the example shown above, the sequence example given below was derived from a 
concatemer prepared according to Example 3, and analyse by the means of the same software 
solution as described above. 

30 >zzc20401cllt3 607 (dimer) 

TGATAAGGCAATGGCCTCTAATGCTGXXXXXXXXXXXXXXXXXXXXXXXX 
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XXXXXXXXXXXXGCCGCCGCGCCTTCCGCGTC ++++++++ 

++GAGGGCCGCCGCCCGCCCTCC* * * * * * AGTTTTTTTTTTTTTTTTG- - 

+++ + + ++ + + + GGGCAGAGCGAGCAGAGCCr******GTCTGT 

CAGAATCAGAAGT ++++++++++GCTTTGCAGACGCCACT 

5 GT*** * * * AAAGTCCACCTGGACTTTCC ++++++++++CG 

TGCGCGGCCTCGGCGGC* * * * * * AACTCTGTTATACACTAAC 

__ ++++++++++ AGAGACTGAACAGCGGGCGA* * * * * *CAGCCATCTTGC 

CCCACCT- ++++++++++GCTTGCCTTCTGGCCATGCC* * * 

* * *CCCCCCTCTATGCGTGCGTC- ++++++++++ AGTGTGG 

10 CTGTTCCATGGNXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX^ 

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx^^ 

XXXXXXG 

linkerl 83-102 1-20 

linkei2 149-168 1-20 
15 linker3 214-233 1-20 

linker4 279-298 1-20 

linker5 343-362 1-20 

linker6 408-427 20-1 C 

linkei7 474-493 20-1 C 
20 tagl 20 GACGCGGAAGGCGCGGCGGC (SEQ ID NO: 60) 

tag2 21 GGAGGGCGGGCGGCGGCCCTC (SEQ ID NO: 61) 

tag3 19 CAAAAAAAAAAAAAAAACT (SEQ ID NO: 62) 

tag4 20 AGGCTCTGCTCGCTCTGCCC (SEQ ID NO: 63) 

tag5 19 ACTTCTGATTCTGACAGAC (SEQ ID NO: 64) 
25 tag6 19 ACAGTGGCGTCTGCAAAGC (SEQ ID NO: 65) 

tag7 20 GGAAAGTCCAGGTGGACTTT (SEQ ID NO: 66) 

tag8 19 GCCGCCGAGGCCGCGCAGG (SEQ ID NO: 67) 

tag9 19 GTTAGTGTATAACAGAGTT (SEQ ID NO: 68) 

taglO 20 TCGCCCGCTGTTCAGTCTCT (SEQ ID NO: 69) 
30 tagll 19 AGGTGGGGCAAGATGGCTG (SEQ ID NO: 70) 

tagl2 20 GGCATGGCCAGAAGGCAAGC (SEQ ID NO: 71) 
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tagl3 20 GACGCACGCATAGAGGGGGG (SEQ ID NO: 72) 

tagl4 19 NCCATGGAACAGCCACACT (SEQ ID NO: 73) 

junkl 26 TGATAAGGCAATGGCCTCTAATGCTG (SEQ ID NO: 74) 

junk2 1 G 

5 

Note that in both example sequence reads the length of the 5' end specific tags varies in 
length, because Mme I cut with some frequency shorter DNA fragments. A statistical 
analysis of 5' end specific tags showed that in the examples about 45% of the tags had a 
length of 21 bp and additional 44% of the tags had a length of 20 bp. Also for the use of the 
10 Class IIS enzyme Gsul some variations in the sequence length have been seen, though about 
92% of the cases 16 bp DNA fragments were obtained. 

Example 6: Characterization of 5 '-end sequence tags 1 

15 5 ' end specific sequence tags can be analyzed for their identity by standard software 
solutions to perform sequence alignments like NCBI BLAST 

(htt p://www^rlcbi.nlm.mh > g•ov/BI■AvST/) ^ FASTA, available in the Genetics Computer Group 
(GCG) package from Accelrys Inc. (http://www.accelrys.com/) or alike. Such software 
solutions allow for an alignment of 5' end specific sequence tags among one another to 
20 identify unique or non-redundant tags, which can be further used in 
Database searches and building a 5'-end sequence database. 
Gene identification using a 5 '-end sequence database 

An example of a BLAST search in GenBank using a 5' end specific tag is given below: The 
16 bp tag (5'-ACC TCC CTC CGC GGA G) (SEQ ID NO: 75) is derived from the 5' end of 
25 Human TGF-bl : JBC 264 (1989) 402-408. 

Query= (1 6 letters)(ACCTCCCTCCGCGGAG) 

Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, 
GSS, or phase 0, 1 or 2 HTGS sequences) 
30 1,205,903 sequences; 5,297,768,116 total letters 
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Score E 

Sequences producing significant alignments: (bits) Value 

gi jl 0863 872 | ref|NM 000660 , 1 1 Homo sapiens transforming grow.,. _32 1.1 
gijl8590091| ref |XM 085882.11 Homo sapiens similar to transf... 32 1.1 
5 gi[11424057 | ref|XM 008912.11 Homo sapiens transforming grow... 32 1.1 
gij7684381|gblAC011462.4lAC()11462 Homo sapiens chromosome 1... 32 1.1 
gi[15027087lernbjAO89894yl|LMFL,CHR4 A Leishmania major Fried... 32 1.1 
gi|1943914(gblU70540.1|LMU70S40 Leishmania mexicana amazone... J32 1.1 
gij37Q97jemb | X05839.1 | HSTGFBG1 Human transforming growth fa... 32 1.1 
10 gi [37092jemb|X02812.1|HSTGFBl Human mRNA for transforming g... 32 1.1 
gi j340526|gblJ04431.llHUMTGFBlPR Homo sapiens transforming ... 32 1.1 
Alignments 

> gijl 0863872jref 1NM 0Q0660.1 [ Homo sapiens transforming growth factor, beta 1 
(Camurati-Engelmann disease) (TGFB1), mRNA 
15 Length = 2745 . 

Score = 32.2 bits (16), Expect = 1.1 
Identities = 16/16 (100%) 
Strand = Plus / Plus 

20 

Query: 1 acctccctccgcggag 16 

liiiiiliiiiiiiii 

Sbjct: 1 acctccctccgcggag 16 

25 >gi jl 8590091 jref1XM_Q85882.1 [ Homo sapiens similar to transforming growth factor, beta 1 
(H. sapiens) (LOC147760), mRNA 
Length = 697 

Score = 32.2 bits (16), Expect = 1.1 
30 Identities = 16/16 (100%) 
Strand = Plus / Plus 
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Query: 1 acctccctccgcggag 16 

iiiiiimiiiiiiii 

Sbjct: 7 acctccctccgcggag 22 

5 

>gi jll424()57jref|XM QQ8912JLJ Homo sapiens transforming growth factor, beta 1 (TGFB1), 
mRNA 

Length = 2741 

10 Score = 32.2 bits (16), Expect = 1.1 
Identities = 16/16 (100%) 
Strand = Plus / Plus 

* 

Query: 1 acctccctccgcggag 16 

15 Illlllllllilllll 

Sbjct: 1 acctccctccgcggag 16 

Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, 
or phase 0, 1 or 2 HTGS sequences) 
20 Posted date: Apr 9, 2002 10:59 AM 

Number of letters in database: 1,002,800,820 
Number of sequences in database: 1,205,903 
Lambda K H 
137 0.711 131 
25 Gapped 

Lambda K H 

137 0.711 131 
Matrix: blastn matrix:l -3 
Gap Penalties: Existence: 5, Extension: 2 
30 Number of Hits to DB: 6901 

Number of Sequences: 1205903 
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Number of extensions: 6901 
Number of successful extensions: 1479 
Number of sequences better than 10.0: 16 
length of query: 16 
5 length of database: 5,297,768,116 
effective HSP length: 15 
effective length of query: 1 
effective length of database: 5,279,679,571 
effective search space: 5279679571 
10 effective search space used: 5279679571 
T:0 
A: 30 

XI: 6 (11.9 bits) 
X2: 15 (29.7 bits) 
15 SI: 12 (24.3 bits) 
S2: 15 (30.2 bits) 



Top of Form 

1 : NM_000660. Homo sapiens Related Sequences, OMIM. Protein. PubN 

tran...[gi:10863872] Taxonomy. UniSTS. LinkOut 



LOCUS NM_000660 2745 bp mRNA linear PRI 13-FEB-2002 

20 DEFINITION Homo sapiens transforming growth factor, beta 1 (Camurati-Engelmann 

disease) (TGFB1), mRNA. 
ACCESSION NM_000660 
VERSION NM_000660.1 GL10863872 
KEYWORDS . 
25 SOURCE human. 

ORGANISM Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
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REFERENCE 1 (bases 1 to 2745) 
AUTHORS Derynck,R., Jarrett,J.A, Chen,E.Y, Eaton,D.H, Bell,J.R., 

Assoian,R.K., Roberts,A.B., Sporn,M.B. and Goeddel,D.V. 
TITLE Human transforming growth factor-beta complementary DNA sequence 

and expression in normal and transformed cells 
JOURNAL Nature 316 (6030), 701-705 (1985) 
MEDLINE 85296301 
REFERENCE 2 (bases 1 to 2745) 
AUTHORS Sporn,M.B., Roberts,AB., Wakefield,L.M. and Assoian,R.K. 
TITLE Transforming growth factor-beta: biological function and chemical 

structure 

JOURNAL Science 233 (4763), 532-534 (1986) 

MEDLINE 86261803 

PUBMED 3487831 
REFERENCE 3 (bases 1 to 2745) 

AUTHORS Chang,N.S., Mattison,J., Cao,H., Pratt,N., Zhao,Y. and Lee,C. 

TITLE Cloning and characterization of a novel transforming growth 
factor-betal -induced TIAF1 protein that inhibits tumor necrosis 
factor cytotoxicity 

JOURNAL Biochem. Biophys. Res. Commun. 253 (3), 743-749 (1998) 

MEDLINE 99119079 

PUBMED 9918798 
REFERENCE 4 (bases 1 to 2745) 

AUTHORS Ghadami,M., Makita,Y, Yoshida,K., Nishimura,G., Fukushima,Y, 
Wakui,K., Dcegawa,S., Yamada,K., Kondo,S., Niikawa,N. and Tomita,H. 

TITLE Genetic mapping of the Camurati-Engelmann disease locus to 
chromosome 19ql3.1-ql3.3 

JOURNAL Am. J. Hum. Genet. 66 (1), 143-147 (2000) 

MEDLINE 20100617 

PUBMED 10631145 
REFERENCE 5 (bases 1 to 2745) 
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AUTHORS Vaughn,S.P., Broussard,S., Hall,C.R., Scott,A, Blanton,S.H., 

Milunsky,J.M. and Hecht,J.T. 
TITLE Confirmation of the mapping of the Camurati-Englemann locus to 

19ql3. 2 and refinement to a 3.2-cM region 
JOURNAL Genomics 66 (1), 119-121 (2000) 
MEDLINE 20304762 
PUBMED 10843814 
REFERENCE 6 (bases 1 to 2745) 
AUTHORS Lim, J.M., Kim, J A., Lee, J.H. and Joo, C.K. 
TITLE Downregulated expression of integrin alpha6 by transforming growth 

f actor-be ta(l) on lens epithelial cells in vitro 
JOURNAL Biochem. Biophys. Res. Commun. 284 (1), 33-41 (2001) 
MEDLINE 21268957 
PUBMED j .1374867 
COMMENT PROVISIONAL REFSEO: This record has not yet been subject to final 

NCBI review. The reference sequence was derived from X02812.1 . 
FEATURES Location/Qualifiers 
source 1..2745 

/organism= "Homo sapiens" 
/db_xref="taxon:9606 " 
/chromosome= "19" 
/map="19ql3.1" 
gene 1..2745 

/gene="TGFBl" 
/note="TGFB; DPD1; CED" 
/db_xref= "LocusID : 7040 " 
/db_xref = "MIM :190180 " 
misc feature 37..113 

/note="pot. hairpin loops-forming region" 
variation 72 

/allele="-" 
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/allele="C" 

/db_xref= "dbSNPt 'l 800999 " 
variation 79 

/allele= ,, -" 
/allele="C" 

/db_xref= "dbSNP:1799753 " 
CDS 842.. 2017 

/gene="TGFBl" 

/no te= "transforming growth factor, beta 1; diaphyseal 

dysplasia 1, progressive (Camurati-Engelmann disease)" 

/codon_start=l 

/db_xref= "LocusID : 7040 " 

/db_xref= "MIM :190180" 

/product= "transforming growth factor, beta 1 

(Camurati-Engelmann disease)" 

/protein_id=" NP _00065 1 .1 " 

/db_xref= "GI:10863 873 " 

/translations "MPPSGLRLLPLLLPLLWELVLTPGPPAAGLSTCKTIDMELVKRKRIEAIR 

GQILSKLRLASPPSQGEVPPGPLPEAVLALYNSTRDRVAGESAEPEPEPEADYYAKEV 

TRVLMVETHNEIYDKFKQSTHSiYMFFNTSELREAVPEPVLLSRAELRLLRRLKEKVE 

QHVELYQKYSNNSWRYLSNRLLAPSDSPEWLSFDVTGVVRQWLSRGGEIEGFRLSA 

H(^CDSRDNTLQVDINGFTTGRRGDLATIHGMNRPFLLLMATPLERAQHLQSSRHRR 

ALDTNYCFSSTEKNCCVRQLYIDFRKDLGWKWIHEPKGYHANFCLGPCPYIWSLDT 

QYSKVLALYNQHNPGASAAPCCVPQALEPLPIVYYVGRKPKVEQLSNMIVRSCKCS" 

(SEQIDNO:77) 

misc_feature 863 ..91 0 

/note="pot. core sequence of signal peptide (aa -272 to 

-257)" 
variation 870 

/allele="C" 
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/aUele="T n 

/db_xref= "dbSNP: L 982073 " 
variation 915 

/allele="C" 
5 /allele="G" 

/db_xref= "dbSNPr l 800471 " 
miscjfeature 938..1600 

/note="TGFb_propeptide; Region: TGF-beta propeptide" 

misc feature 953 

_U4 ■ ■ ■■■ 

10 /note="pot. altern. translation start site" 

miscjfeature 1035. .1043 

/note="put. glycosylation site" 
miscjfeature 1247..1255 

/note="put. glycosylation site" 
15 miscjfeature 1370..1378 

/note="put. glycosylation site" 
variation 1632 

/allele="C" 
/allele="T" 

20 /db_xref= "dbSNP: 180Q472 " 

matpeptide 1679..2014 

/product^ "mature TGF-beta (aa 1-112)" 
misc feature 1715. .2014 

■• zzs 

/no te= "TGF-beta; Region: Transforming growth factor beta 
25 like domain" 

miscfeature 1721. .2014 

/note="TGFB; Region: Transforming growth factor-beta 
(TGF-beta) family" 
misc feature 2018..2096 
30 /note="GC-rich region" 

promoter 2097 ..21 03 
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/note="TATA-box-like region" 
miscjeature 2517..2522 

/note="put. polyadenylation signal 11 
polyA site 2539 
5 /no te= "polyadenylation site" 

BASE COUNT 527 a 938 c 801 g 479 1 
ORIGIN 

1 acctccctcc gcggag cagc cagacagcga gggccccggc cgggggcagg ggggacgccc 
61 cgtccggggc accccccccg gctctgagcc gcccgcgggg ccggcctcgg cccggagcgg 

10 121 aggaaggagt cgccgaggag eagcctgagg ccccagagtc tgagacgagc cgccgccgcc 

181 cccgccactg cggggaggag ggggaggagg agcgggagga gggacgagct ggtcgggaga 
241 agaggaaaaa aacttttgag acttttccgt tgccgctggg agccggaggc gcggggacct 
301 cttggcgcga cgctgccccg cgaggaggca ggacttgggg accccagacc gcctcccttt 
361 gccgccgggg acgcttgctc cctccctgcc ccctacacgg cgtccctcag gcgcccccat 

15 421 tccggaccag ccctcgggag tcgccgaccc ggcctcccgc aaagactttt ccccagacct 

481 cgggcgcacc ccctgcacgc cgccttcatc cccggcctgt ctcctgagcc cccgcgcatc 
541 ctagaccctt tctcctccag gagacggatc tctctccgac ctgccacaga tcccctattc 
601 aagaccaccc accttctggt accagatcgc gcccatctag gttatttccg tgggatactg 
661 agacaccccc ggtccaagcc tcccctccac cactgcgccc ttctccctga ggagcctcag 

20 721 ctttccctcg aggccctcct accttttgcc gggagacccc cagcccctgc aggggcgggg 

781 cctccccacc acaccagccc tgttcgcgct ctcggcagtg ccggggggcg ccgcctcccc 
841 catgccgccc tccgggctgc ggctgctgcc gctgctgcta ccgctgctgt ggctactggt 
901 gctgacgcct ggcccgccgg ccgcgggact atccacctgc aagactatcg acatggagct 
961 ggtgaagcgg aagcgcatcg aggccatccg cggccagatc ctgtccaagc tgcggctcgc 

25 1021 cagccccccg agccaggggg aggtgccgcc cggcccgctg cccgaggccg tgctcgccct 
1081 gtacaacagc acccgcgacc gggtggccgg ggagagtgca gaaccggagc ccgagcctga 
1141 ggccgactac tacgccaagg aggtcacccg cgtgctaatg gtggaaaccc acaacgaaat 
1201 ctatgacaag ttcaagcaga gtacacacag catatatatg ttcttcaaca catcagagct 
1261 ccgagaagcg gtacctgaac ccgtgttgct ctcccgggca gagctgcgtc tgctgaggag 

30 1321 gctcaagtta aaagtggagc agcacgtgga gctgtaccag aaatacagca acaattcctg 
1381 gcgatacctc agcaaccggc tgctggcacc cagcgactcg ccagagtggt tatcttttga 

78 



WO 03/106672 



PCT/JP03/07514 



1441 tgtcaccgga gttgtgcggc agtggttgag ccgtggaggg gaaattgagg gctttcgcct 
1501 tagcgcccac tgctcctgtg acagcaggga taacacactg caagtggaca tcaacgggtt 
1561 cactaccggc cgccgaggtg acctggccac cattcatggc atgaaccggc ctttcctgct 
1621 tctcatggcc accccgctgg agagggccca gcatctgcaa agctcccggc accgccgagc 
5 1681 cctggacacc aactattgct tcagctccac ggagaagaac tgctgcgtgc ggcagctgta 
1741 cattgacttc cgcaaggacc tcggctggaa gtggatccac gagcccaagg gctaccatgc 
1801 caacttctgc ctcgggccct gcccctacat ttggagcctg gacacgcagt acagcaaggt 
1861 cctggccctg tacaaccagc ataacccggg cgcctcggcg gcgccgtgct gcgtgccgca 
1921 ggcgctggag ccgctgccca tcgtgtacta cgtgggccgc aagcccaagg tggagcagct 

10 1981 gtccaacatg atcgtgcgct cctgcaagtg cagctgaggt cccgccccgc cccgccccgc 

2041 cccggcaggc ccggccccac cccgccccgc ccccgctgcc ttgcccatgg gggctgtatt 
2101 taaggacacc gtgccccaag cccacctggg gccccattaa agatggagag aggactgcgg 
2161 atctctgtgt cattgggcgc ctgcctgggg tctccatccc tgacgttccc ccactcccac 
2221 tccctctctc tccctctctg ectcctcctg cctgtctgca ctattccttt gcccggcatc 

15 2281 aaggcacagg ggaccagtgg ggaacactac tgtagttaga tctatttatt gagcaccttg 
2341 ggcactgttg aagtgcctta cattaatgaa ctcattcagt caccatagca acactctgag „ 
2401 atggcaggga ctctgataac acccatttta aaggttgagg aaacaagccc agagaggtta 
2461 agggaggagt tcctgcccac caggaacctg ctttagtggg ggatagtgaa gaagacaata 
2521 aaagatagta gttcaggcca ggcggggtgc tcacgcctgt aatcctagca cttttgggag 

20 2581 gcagagatgg gaggatactt gaatccaggc atttgagacc agcctgggta acatagtgag 
2641 accctatctc tacaaaacac ttttaaaaaa tgtacacctg tggtcccagc tactctggag 
2701 gctaaggtgg gaggatcact tgatcctggg aggtcaaggc tgcag 



// 

(SEQIDNO:76) 
25 Bottom of Form 

Revised: October 24, 2001. 



30 



Blast search in NCBI database using some tags from Example 6. Only the hit with the highest 
score is shown: 



Tag sequence for query: 
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GTGGTGTGCGTGTCGAAGGT 
Result: 

Score E 

5 Sequences producing significant alignments: (bits) Value 

gij26556 8 |gb |S54914.1 [ Mus musculus BUP (bup) gene, complete .40 0.007 11 
gij24430261 1 embjAL928680.5[ Mouse DNA sequence from clone R... _40 0.007 
gi j22797896|embjAL158211:29j Human DNA sequence from clone ... _40 0.007 

10 

r " >gi|265568lgb jS54914. 1 j B Mus musculus BUP (bup) gene, complete cds 
Length = 2022 

Score = 40.1 bits (20), Expect = 0.007 
15 Identities - 20/20 (100%) 
Strand = Plus / Plus 

Query: 1 gtggtgtgcgtgtcgaaggt 20 

20 IIIIIIIIIMIIIIIIIII 

Sbjct: 968 gtggtgtgcgtgtcgaaggt 987 



25 n >gi| 24430261jemblAL928680.5 [ H Mouse DNA sequence from clone RP23-396N6 on 
chromosome 2, complete 
sequence 
Length = 217726 

30 Score = 40.1 bits (20), Expect = 0.007 
Identities = 20/20 (100%) 
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Strand = Plus / Plus 

Query: 1 gtggtgtgcgtgtcgaaggt 20 

llllllllllllllllllll 

Sbjct: 19552 gtggtgtgcgtgtcgaaggt 19571 



1 0 ° >gi|22797896}emb[ALlS821 1.291 H Human DNA sequence from clone RP11 -573 G6 
on chromosome 10, complete 
sequence 
Length = 138094 

15 Score = 40.1 bits (20), Expect = 0.007 
Identities = 20/20 (100%) 
Strand = Plus / Plus 

20 Query: 1 gtggtgtgcgtgtcgaaggt 20 

llllllllllllllllllll 

Sbjct: 71390 gtggtgtgcgtgtcgaaggt 71409 

25 Tag sequence for query: GACGCGGAAGGCGCGGCGGC 
Result: 

Score E 

Sequences producing significant alignments: (bits) Value 



30 gi [2891351 8jgb|BC048682.1j Mus musculus, dystrobrevin bindi... 40 0.007 
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>gi |28913518| gb jBC048682.1| S Mus musculus, dystrobrevin binding protein 1, clone 
IMAGE:6515997, mRNA, partial cds 
Length = 1384 

Score = 40.1 bits (20), Expect = 0.007 
Identities = 20/20 (100%) 
Strand = Plus / Plus 



10 Query: 1 gacgcggaaggcgcggcggc 20 

llllllllllllllilllll 

Sbjct: 36 gacgcggaaggcgcggcggc 55 



15 Example 7 : Mapping of 5 ' end specific tags to the genome 

5' end specific sequence tags obtained as describe in this Example can be used to identify 
transcribed regions within genomes for which partial or entire sequences were obtained. Such 
a search can be performed using standard software solutions like NCBI BLAST 
20 (htt p://www.ncbi.nlm.nih.gov/BLAST/) to align the 5 5 end specific sequence tags to genomic 
sequences. In the case of large genomes like those from human, rat or mouse it may be 
necessary to extend the initial sequence information obtained from concatemers. The use of 
extended sequences allows for a more precise identification of actively transcribed regions in 
the genome. 

25 

In another example 5' end tags from concatemers prepared according to Examples 1 and 3 
were further analyzed by mapping to the mouse genome. For this example a library of 5 5 end 
tags was prepared from total brain of adult mice according to Example 1 and from 17.5 days 
whole embryos from mouse according to Example 3. Tag sequences were obtained from 
30 sequence reads by computational means as described in Example 5. Sequence tags were 
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mapped to the mouse genome with a threshold of at least 18 bp matches and using penalties 
for mismatches or gaps. The table given below summarizes the results: 



Type 


# Tags Used 


Mapped 


Single Site 


Redundancy 


Example 1 


8,624 


5,185 


4,308 


3,401 


Example 3 


3,005 


2,313 


1,836 


283 



5 Statistical analysis and comparison to know genes indicated that about 89% of the sites are 
most likely true start sites of transcription. 



Example 8: Statistical analysis of 5 9 end sequence tags 

10 5' end sequence tags obtained from the same plurality of mRNAs in a sample or nucleic acid 
fragments within the same cDNA library can be analyzed by a standard software solution like. 
NCBI BLAST (htip://www. ncbi.nlm.nih.gov/BLAST/) to identify non-redundant sequence 
tags as describe in Example 5. All such non-redundant sequence tags can then be individually 
counted and further analyzed for the contribution of each non-redundant tag to the total 

15 number of all tags obtained from the same sample. The contribution of an individual tag to 

the total number of all tags should allow for a quantification of the transcripts in a plurality of 
mRNAs in the sample or a cDNA library. The results obtained in such a way on individual 
samples can be further compared with similar data obtained from other samples to compare 
their expression patterns. 

20 

Example 9: Identification of transcriptional start sites 

5 ' end specific sequence tags, which could be mapped to genomic sequences, allow for the 
identification of regulatory sequences. In a gene the DNA upstream of the 5' end of 
25 transcribed regions usually encompasses most of the regulatory elements, which are used in 
the control of gene expression. These regulatory sequences can be further analyzed for their 
functionality by searches in databases, which hold information on binding sites for 
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transcription factors. Publicly available databases on transcription factor binding sites and for 
promoter analysis include: 

Transcription Regulatory Region Database (TRRD) 

(ht1p://wmvmgs.bionet.nscaWmgs/dbases/trrd4/ ) 
5 TRANSFAC (http ://transfac.gbf,de/TRANSFACA 

TFSEARCH (http://wvvwxbrc.jp/research/dlx-TFSEAR 

Promoterlnspector provide by Genomatix Software (http://www.genomatix.de/) 

* • 

Example 10: Cloning of full-length cDNAs using information derived from 5' end sequence 
10 tags 

Sequence information derived from the concatemers can be used to synthesize specific 
primers for the cloning of full-length cDNAs. In such an approach, the sequence derived 
from a given 5' end specific tag can be used to design a forward primer while the choice of 

15 the reverse primer would be dependent on the template DNA used in the amplification 

reaction. Amplification by the polymerase chain reaction (PGR) can be performed using a 
template derived from a plurality of RNA obtained from a biological sample and an oligo-dT 
primer. In the first step the oligo-dT primer and a reverse transcriptase are used to synthesize 
a cDNA pool. In the second step a forward primer derived from a 5' end specific tag and an 

20 oligo-dT primer are used to amplify a full-length cDNA from the cDNA pool. Similarly, a 
specific full-length cDNAcan be amplified from an exiting cDNA library using a forward 
primer derived from a 5' end tag and a vector nested reversed primer. 

Example 1 1 : Alternative approaches for the cloning of 5 '-end tags from cDNA libraries 

25 

A plurality of cDNAs can be amplified from an exciting cDNA library having a recognition 
site for a class lis endonuclease at the 5' end of the inserts. The PCR products derived from 
such a library would be further treated as described in the examples herein. 

30 Example 12: Cloning of 5 ' ends by replacement of the Cap structure by an oligonucleotide 
having a class lis recognition site 
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A cDNA/RNA hybrid encompassing the 5 ' end of an initial transcript can be obtained as 
described in Examples 1 to 3. The Cap structure in such cDNA/RNA hybrids is then 
enzymatically removed by a hydrolyzing enzyme such as the T4 polynucleotide kinase or the 
5 tobacco acid pyrophosphatase. A single or double-stranded oligonucleotide having a class lis 
recognition site is then ligated by T4 RNA ligase to the RNA at the phosphate present at the 
5' end of the de-capped mRNA The ligated oligonucleotide will function as a primer for the 
second strand synthesis following the procedure given in Examples 1 to 3. By the use of a 
modified oligonucleotide in the ligation step the double-stranded cDNA can be attached to a 
10 support and used for the cloning of concatemers as described herein. 

Example 13 : Amplification step for a sample 

^ r 

In cases where the amount of a sample is limiting to the invention, the sample material can be 
15 amplified by the following approach. In a first step a plurality of mRNAs is treated as 
described in Example 11 to replace the cap structure by an appropriate oligonucleotide 
having a class lis recognition site. In a second step the aforementioned template is amplified 
by a PGR step using a primer complementary to the linker and a poly-A primer. The PGR 
product can be used for the invention as described in the Examples 1. 

20 

Example 14: Utilization of extended 5' -end sequences 

Initial 5' end sequences obtained for concatemers can be used to synthesize sequencing 
primers to obtain extended sequence information on the 5' end of a transcribed region. 

25 

Example 15: Gene inactivation 

Sequence information obtained from 5' end specific sequence tags can be used for the design 
of anti-sense probes or RNAi, which could be applied in knockdown studies. 

30 
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CLAIIMS 

1. A method for preparing a DNA fragment corresponding to a nucleotide sequence of a 5' 
end region of an mRNA, comprising the steps of: 

5 (a) preparing a nucleic acid corresponding to a nucleotide sequence of the 5' end of an 

mRNA; 

(b) attaching at least one linker to the nucleic acid; 

(c) cleaving the nucleic acid with a restriction enzyme having its recognition site 
within the linker and its cleavage site within the nucleic acid corresponding to the 5' end of 

10 the mRNA; and 

(d) collecting a resulting DNA fragment corresponding to the 5' end of the mRNA. 

2. The method according to claim 1, the length of the DNA fragment is about 5-100 bp. 
15 3 . The method according to claim 1, the length of the DNA fragment is about 15-30 bp. 

4. The method according to claim 1, the length of the DNA fragment are about 10-30 bp. 

5. The method according to claim 1, wherein the nucleic acid in step (a) is derived from one 
20 selected from the group consisting of a total RNA, an mRNA and a full-length cDNA. 

6. The method according to claim 1, wherein step (a) comprises the steps of: 
substituting a 5' cap structure of the mRNA with an oligonucleotide; and 
synthesizing a first-strand cDNA using the mRNA as a template to produce a nucleic acid 

25 corresponding to the 5' end of the mRNA. 

7. A method for preparing a DNA fragment corresponding to a nucleotide sequence of a 5' 
end region of an mRNA, comprising steps of: 

(a) substituting a cap structure of an mRNA with an oligonucleotide, wherein the 
30 oligonucleotide comprises a restriction enzyme recognition site, and a cleavage site of a 
restriction enzyme is within the nucleic acid corresponding to the 5' end of the mRNA; 
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(b) synthesizing a first strand cDNA using the mRNA as a template; 

(c) synthesizing a second strand cDNA using the first stand cDNA as a template; 

(d) cleaving a resulting double stranded cDNA with the restriction enzyme; and 

(e) collecting a resulting DNA fragment corresponding to 5 9 end of the mRNA. 

8* The method according to claim 1 or 7, wherein the nucleic acid in step (a) is derived from 
a biological sample, an in vitro synthesized RNA, a cDNA library, artificially created 
pluralities of nucleic acids, or a tag library, 

9. The method according to claim 1 ? wherein step (a) comprises the steps of: 

synthesizing first-strand cDNAs using RNAs as a template and producing 
cDNA/RNA hybrids of the resulting first-strand cDNAs and the RNAs; 

selecting a particular cDNA/RNA hybrid that has the 5' cap structure of the mRNA 
using a selective binding substance which specifically recognizes the 5' cap structure; and 

recovering a nucleic acid corresponding to the 5* end of the mRNA. 

10. The method according to claim 9, wherein the nucleic acid prepared in step (a) is a full- 
length cDNA, wherein the selective binding substance is attached to a support. 

11. The method according to claim 1, wherein step (a) comprises the steps of: 

synthesizing first strand cDNAs using RNAs as a template and producing 
cDNA/RNA hybrids of the resulting first strand cDNAs and the RNAs; and 

recovering a nucleic acid corresponding to the 5' end region of the mRNA from the 
cDNA/RNA hybrids. 

12. The method according to claim 1, wherein step (a) comprises the steps of: 

synthesizing first strand cDNAs using RNAs as a template and producing 
cDNA/RNA hybrids of the resulting first strand cDNAs and the RNAs; 

conjugating a selective binding substance to a 5 1 cap structure of an mRNA present in 
the RNAs; 
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contacting the cDNA/RNA hybrids with a support, wherein another matching 
selective binding substance is fixed to the support, and the matching selective binding 
substance specifically binds to the selective binding substance; and 

recovering the a nucleic acid corresponding to the 5 f end of the mRNA from the 
5 mRNA fixed to the support. 

13. The method according to claim 9 or 10, wherein the selective binding substance is a cap 
binding protein or a cap binding antibody. 

10 14. The method according to claim 12, wherein the selective binding substance is biotin, and 
the matching selective binding substance is selected from the group consisting of avidin, 
streptavidin and a derivative therefrom which specifically binds to biotin. 

15. The method according to claim 12, wherein the selective binding substance is 

15 digoxigenin and the matching selective binding substance is an antibody against digoxigenin. 

16. The method according to claim 10 or 12, wherein the support is made of magnetic beads, 
agarose beads, latex beads, sepharose matrix, silicagel matrix or glass beads. 

20 17. The method according to claim 1 , wherein step (b) comprises the steps of: 

attaching a linker to an end region corresponding to the nucleotide sequence of a 5' 
end region of the mRNA, wherein the linker carries at least one restriction enzyme 
recognition site for a restriction enzyme that cleaves a site different from its recognition 
sequence; 

25 synthesizing a second-strand cDNA using the nucleic acid as a template; 

treating a resulting linker-bound double-stranded cDNA with the restriction enzyme; 

and 

recovering a resulting fragment which contains a linker moiety and a part of cDNA 
corresponding to the 5' end regions of the mRNA. 

30 
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18. The method according to claim 17, wherein the linker contains a double-stranded 
oligonucleotide region, and the second-strand cDNA is synthesized using the linker. 

19. The method according to claim 17, wherein the second-strand cDNAis synthesized using 
5 other oligonucleotides which are partially or totally complement to the linker. 

20. The method according to claim 17, wherein a selective binding substance is attached to or 
included in the linker, and the recovering step comprises the steps of binding the selective 
binding substance to a matching selective binding substance immobilized on a support, and 

10 recovering the support, wherein the matching selective binding substance specifically binds 
to the selective binding substance. 

21. The method according to claim 20, wherein the selective binding substance is biotin, and 
the matching selective binding substance is selected from the group consisting of avidin, 

15 streptavidin and a derivative therefrom which specifically binds to biotin. 

22. The method according to claim 20, wherein the selective binding substance is 
digoxigenin, and the matching selective binding substance is an antibody against digoxigenin. 

20 23. The method according to claim 17, wherein the restriction enzyme is the Class II or Class 
III restriction enzyme. 

24. The method according to claim 17, wherein the restriction enzyme is the Class IIG and 
Class IIS restriction enzymes. 

25 

25. The method according to claim 23, wherein the restriction enzyme is selected from the 
group consisting of Gsu I, Mmel, Bpm I, Bsgl and EcoPlSI. 

26. A method for determining a nucleotide sequence of the 5' end region of the mRNA by 
30 sequencing the DNA fragment prepared by the method according to claim 1. 
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27. The method according to claim 1, further comprising amplifying the nucleic acid 
corresponding the 5 ! end region of the mRNA by a DNA polymerase or a cocktail of DNA 
polymerases. 

28. The method according to claim 27, wherein the DNA polymerase is heat-stable. 

29. The method according to claim 27, wherein the DNA polymerase is selected from the 
group consisting of Taq polymerase, Pwo DNA polymerase, Kod DNA polymerase, Pfu 
DNA polymerase, Vent DNA polymerase, Deep Vent DNA polymerase, rBST DNA 
polymerase, and Master Amp AmpliTherm DNA polymerase. 

30. The method according to claim 1, wherein the first strand cDNA is synthesized and 
fractionated by physical means. 

• •• . 

31. The method according to claim 30, wherein the nucleic acid is fractionated by 
hybridizing to a plurality of nucleic acids. 

32. A method according to claim 1, further comprising the step of attaching the collected 
nucleic acid to beads. 

33. A method for preparing a concatemer comprising one or more DNA fragments, 
comprising the step of ligating one or more of DNA fragments obtained by the method 
according to claim 1 and corresponding to the 5' end of the mRNA. 

34. A concatemer prepared by the method according to claim 33. 

35. A vector comprising the concatemer according to claim 34. 

36. A sequence derived from the concatemer according to claim 34. 
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37. The method for determining the transcriptional states of a sample using a sequence 
derived from the DNA fragment prepared by the method according to claim 1. 

38. The method for obtaining expression data on a plurality of mRNAs or cDNAs in a sample 
5 using a sequence derived from the DNA fragment prepared by the method according to claim 

1. 

39. The method quantifying expression data on a plurality of mRNAs in a sample using a 
sequence derived from the DNA fragment prepared by the method according to claim 1 . 

10 

40. The method for building a database holding sequence information using a sequence 
derived from the DNA fragment prepared by the method according to claim 1. 

41. The method identifying transcribed regions from a genomic sequence using a sequence 
15 derived from the DNA fragment prepared by the method according to claim 1. 

42. The method for identifying a transcription initiation site and a related regulatory sequence 
in a genomic sequence using a sequence derived from the DNA fragment prepared by the 
method according to claim 1. 

20 

43. The method for cloning a full-length or partial cDNA from a cDNA library or biological 
sample using a sequence derived from the DNA fragment prepared by the method according 
to claim 1. 

25 44. The method for cloning a complete or partial promoter region of a gene from a genomic 
library or genomic DNA using a sequence derived from the DNA fragment prepared by the 
method according to claim 1. 

45. The method for analyzing the activity of regulatory regions in a genome based on 
30 genomic sequence information using a sequence derived from the DNA fragment prepared 
by the method according to claim 1. 
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46. The method for inactivating a gene or altering its expression using a sequence derived 
from the DNA fragment prepared by the method according to claim 1. 

47. The method according to claim 46, wherein the gene is inactivated or altered in its 
expression by the means of siRNA or RNAi. 

48. The method for synthesizing a nucleotide sequence to be used as the linker or primer 
based on a sequence derived from the DNA fragment prepared by the method according to 
claim 1 . 

49. The method for synthesizing a hybridization probe based on a sequence derived from the 
DNA fragment prepared by the method according to claim 1. 

50. The method according to claim 49, wherein the hybridization probe is attached to a 
support. 

51. The method according to claim 49, wherein the hybridization probe is a probe to identify 
the sequence corresponding to the nucleotide sequence of the 5' end region of the mRNA. 

52. The method according to claim 1, further comprising extending the 5' end region of the 
nucleotide sequence. 

53. A method according to claim 1 used for the development of diagnostic tools. 

54. A method according to claim 1 used for the development of research tools. 

55. A method according to claim 1 used for the development of a reagent or a kit. 
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SEQUENCE LISTING 

<110> RIKEN 

KABUSHIKI KAISHA DNAFORM 

<120> Method for utilizing the 5' end of mRNA for cloning and analysis 
<130> 1336(PCT) 

<150> JP 2002-171851 
<151> 2002-06-12 

<150> JP 2002-235294 
<151> 2002-08-12 

<160> 77 

<170> Patentln version 3.1 

<210> 1 

<211> 74 

<212> DNA 

<213> Artificial 

<220> 

<223> First strand cDNA primer 
<220> 

<221> mise_feature 

<222> (73). .(73) 

<223> 5 V is A, C or G 

<220> 

<22 1 > mis c_f eatur e 

<222> (74).. (74) 

<223> n" is any nucleotide 

<400> 1 

gagagagaga aaggatcctg ccatttcatt acctctttct ccgcacccga catagatttt 60 
tttttttttt ttvn 74 

<210> 2 

<211> 60 

<212> DNA 

<213> Artificial 

<220> 

<223> Upper oligonucleotide GN5 
<220> 

<221> mi sc_f eatur e 

<222> (56). .(60) 

<223> "n* is any nucleotide 

agagagagac ctcgagtaac tataacggtc ctaaggtagc gacctaggtc cgacgnnnnn bO 

<210> 3 

<211> 60 

<212> DNA 

<213> Artificial 

<220> 

<223> Upper oligonucleotide N6 



1/20 



WO 03/106672 



PCT/JP03/07514 



<220> 

<22 1 > mi sc_f eature 

<222> (55).. (60) 

<223> n w is any nucleotide 

<400> 3 

agagagagac ctcgagtaac tataacggtc ctaaggtagc gacctaggtc cgacnnnnnn 60 

<210> 4 

<211> 54 

<212> DNA 

<213> Artificial 

<220> 

<223> Lower oligonucleotide 

<400> 4 

gtcggaccta ggtcgctacc ttaggaccgt tatagttact cgaggtctct ctct 54 

<210> 5 

<211> 55 

<212> DNA 

<213> Artificial 

<220> 

<223> primer 

<400> 5 

agagagagac ctcgagtaac tataacggtc ctaaggtagc gacctaggtc cgacg 55 

<210> 6 

<211> 45 

<212> DNA 

<213> Artificial 

<220> 

<223> linker 

<400> 6 

tctagatcag gactcttcta tagtgtcacc taaagtctct ctctc 45 

<210> 7 

<211> 47 

<212> DNA 

<213> Artificial 

<220> 

<223> linker 
<220> 

<221> mis c_f eature 

<222> (46).. (47) 

<223> w n" is any nucleotide 

<400> 7 

gagagagaga ctttaggtga cactatagaa gagtcctgat ctagann 47 

<210> 8 

<211> 25 

<212> DNA 

<213> Artificial 
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<220> 

<223> Primer 1 (uni-PCR) 

<400> 8 

gagagagaga ctttaggtga cacta 25 

<210> 9 

<211> 25 

<212> DNA 

<213> Artificial 

<220> 

<223> Primer 2(MmeI-PCR) 

<400> 9 

agagagagac ctcgagtaac tataa -25 

<210> 10 

<211> 44 

<212> DNA 

<213> Artificial 

<220> 

<223> first strand oligo-dT primer 
<220> 

<221> misc_feature 

<222> (43).. (43) 

<223> V is A, C or 6 

<220> 

<221> miscjfeature 

<222> (44).. (44) 

<223> *n M is any nucleotide 

<400> 10 

gagagagaga ggatccttct ggagagtttt tttttttttt ttvn 44 

<210> 11 

<211> 45 

<212> DNA 

<213> Artificial 

<220> 

<223> Oligonucleotide Bg~Gsu-GN5 
<220> 

<221> misc„feature 

<222> (41). .(45) 

<223> 3, n M is any nucleotide 

<400> 11 

agagagagaa ctaggcttaa taggtgacta gatctggagg nnnnn 45 

<210> 12 

<211> 45 

<212> DNA 

<213> Artificial 

<220> 

<223> Oligonucleotide Bg-Gsu-N6 
<220> 
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<221> miscjfeature 

<222> (40).. (45) 

<223> 33 n" is any nucleotide 

<400> 12 

agagagagaa ctaggcttaa taggtgacta gatctggagn nnnnn 45 

<210> 13 

<211> 39 

<212> DNA 

<213> Artificial 

<220> 

<223> Oligonucleotide Bg-Gsu-down 

<400> 13 

ctggagatct agtcacctat taagcctagt tctctctct 39 

<210> 14 

<211> 47 

<212> DNA 

<213> Artificial 

<220> 

<223> Oligonucleotide Bg-Mme-GN5 
<220> 

<221> miscjfeature 

<222> (43).. (47) 

<223> "n" is any nucleotide 

<220> 

<22 1 > mi sc_f eatur e 

<222> (39).. (39) 

<223> *r* is G or A 

<400> 14 

agagagagaa ctaggcttaa taggtgacta gatcttccra cgnnnnn 47 

<210> 15 

<211> 47 

<212> DNA 

<213> Artificial 

<220> 

<223> Oligonucleotide Bg-Mme-N6 
<220> 

<221> miscjfeature 

<222> (42).. (47) 

<223> "n* is any nucleotide 

<220> 

<221> mi sc_f eatur e 

<222> (39).. (39) 

<223> V is G or A 

<400> 15 

agagagagaa ctaggcttaa taggtgacta gatcttccra cnnnnnn 47 

<210> 16 
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<211> 40 

<212> DNA 

<213> Artificial 

<220> 

<223> Oligonucleotide Bg-Mme-down 
<220> 

<221> misc_feature 

<222> (3).. (3) 

<223> "y" is C or T 



<400> 16 

gtyggagatc tagtcaccta ttaagcctag ttctctctct 40 



<210> 17 

<211> 47 

<212> DNA 

<213> Artificial 

<220> 

<223> o 1 i gonuc 1 eo t i de 
<220> 

<221> misc_feature 

<222> (46) ..(47) 

<223> "n" is any nucleotide 



<400> 17 

gagagagaga ctttaggtga cactatagaa gagtcctgag aattcnn 47 



<210> 18 

<211> 45 

<212> DNA 

<213> Artificial 

<220> 

<223> ol i gonuc 1 eot i de 

<400> 18 

gaattctcag gactcttcta tagtgtcacc taaagtctct ctctc 45 



<210> 19 

<211> 49 

<212> DNA 

<213> Artificial 

<220> 

<223> linker 
<220> 

<221> misc_feature 

<222> (45).. (49) 

<223> "n* is any nucleotide 



<400> 19 

agagagagag cttagatgag agtgactcga gcctaggtcc aacgnnnnn 49 



<210> 20 

<211> 49 

<212> DNA 

<213> Artificial 
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<220> 

<223> linker 
<220> 

<22 1 > mi sc_f eatur e 

<222> (44).. (49) 

<223> "n" is any nucleotide 



<400> 20 

agagagagag cttagatgag agtgactcga gcctaggtcc aacnnnnnn 49 



<210> 21 

<211> 43 

<212> DNA 

<213> Artificial 

<220> 

<223> linker 

<400> 21 

gttggaccta ggctcgagtc actctcatct aagctctctc tct 43 



<210> 22 

<211> 42 

<212> DNA 

<213> Artificial 

<220> 

<223> second linker 

<400> 22 

gaattctacg cctctcgatc gaaatcccga tctaggctag eg 42 



<210> 23 

<211> 42 

<212> DNA 

<213> Artificial 

<220> 

<223> second linker 

<400> 23 

ettaagatge ggagagcgtg aatcgagttt aaggctagca tc 42 



<210> 24 

<211> 25 

<212> DNA 

<213> Artificial 

<220> 

<223> primer 

<400> 24 

ttagatgaga gtgactcgag cctag 25 



<210> 25 

<211> 25 

<212> DNA 

<213> Artificial 

<220> 

<223> primer 

<400> 25 
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ctacgatcgg aatttgagct aagtg 25 

<210> 26 

<211> 17 

<212> DNA 

<213> Artificial 

<220> 

<223> primer 
<400> 26 

caggaaacag ctatgac 17 

<210> 27 

<211> 16 

<212> DNA 

<213> Artificial 

<220> 

<223> primer 
<400> 27 

gtaaaacgac ggccag 16 

<210> 28 

<211> 596 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc feature 

<222> (4327.. (432) 

<223> "n" is any nucleotide 

<220> 

<221> misc feature 

<222> (4367.. (436) 

<223> "n" is any nucleotide 

<220> 

<221> misc feature 

<222> (4567.. (456) 

<223> n" is any nucleotide 

<400> 28 

tcgttaacta ttaggcgaat tgggccctct aggtcgacga gttctcagca gagccgccgt 60 

ctagagcccc gccctcccgg gccaccgtcg gacctagaat agttactcga ggtctctcgt 120 

cggacctaga gtttttcgta tgtttgtcat cgtcggacct aggtccgacg gtccattcct 180 

gagagtctct ctaggtccga cgagagagag aggatccttc tgtctagacc ctgacgccgg 240 

aaccgcaccg tcggacctag gtccgacgga aaagcagctt cctccactct aggtccgacg 300 

gtgtgtgtgt gtgtgcgtgt tctagagact ggttcagatc aaaagtcgtc ggacctaggt 360 

ccgacggggc tggtgagatg gctcagtcta gatgcatgct cgagcggccg ccagtgtgat 420 

ggatatctgc cnaatnccag cacaccggcg cgcgcnacca gtggatccga gcccggtacc 480 

aagcttgatg catacctcga gtatcctata ctgtcaccta aatagcttgg ggtaatcatg 540 

gtcatagctg tctcctgtgt gaaattgtta tccgctcaaa attcccaaca acatag 596 
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<210> 29 

<211> 12 

<212> DNA 

<213> Artificial 

<220> 

<223> linker 
<220> 

<22 1 > mi sc_f eature 

<222> (1)..(1) 

<223> V is any nucleotide 

<400> 29 

nctaggtccg ac 12 

<210> 30 

<211> 20 

<212> DNA 

<213> Artificial 

<220> 

<223> linker 
<220> 

<221> misc_feature 

<222> (1)..(1) 

<223> n is any nucleotide 

<220> 

<221> misc_feature 

<222> (20).. (20) 

<223> V is any nucleotide 

<400> 30 

ngttggacct aggtccaacn 20 

<210> 31 

<211> 13 

<212> DNA 

<213> Artificial 

<220> 

<223> linker 

<400> 31 

tctaggtccg acg 13 

<210> 32 

<211> 13 

<212> DNA 

<213> Artificial 

<220> 

<223> linker 

<400> 32 

cctaggtccg acg 13 

<210> 33 

<211> 20 
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<212> DNA 
<213> Artificial 

<220> 

<223> tagl 
<400> 33 

gtggcccggg agggcggggc 



<210> 34 

<211> 19 

<212> DNA 

<213> Artificial 

<220> 

<223> tag2 

<400> 34 

agagacctcg agtaactat 



<210> 35 

<211> 20 

<212> DNA 

<213> Artificial 

<220> 

<223> tag3 

<400> 35 

atgacaaaca tacgaaaaac 



<210> 36 

<211> 19 

<212> DNA 

<213> Artificial 

<220> 

<223> tag4 

<400> 36 

gtccattcct gagagtctc 



<210> 37 

<211> 20 

<212> DNA 

<213> Artificial 

<220> 

<223> tag5 

<400> 37 

agagagagag gatccttctg 



<210> 38 

<211> 20 

<212> DNA 

<213> Artificial 

<220> 

<223> tag6 

<400> 38 

gtgcggttcc ggcgtcaggg 
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<210> 39 

<211> 19 

<212> DNA 

<213> Artificial 

<220> 

<223> tag7 

<400> 39 

gaaaagcagc ttcctccac 



<210> 40 

<211> 20 

<212> DNA 

<213> Artificial 

<220> 

<223> tag8 

<400> 40 

gtgtgtgtgt gtgtgcgtgt 



<210> 41 

<211> 20 

<212> DNA 

<213> Artificial 

<220> 

<223> tag9 

<400> 41 

acttttgatc tgaaccagtc 



<210> 42 

<211> 20 

<212> DNA 

<213> Artificial 

<220> 

<223> taglO 

<400> 42 

gggctggtga gatggctcag 



<210> 43 

<211> 19 

<212> DNA 

<213> Artificial 

<220> 

<223> tagl 

<400> 43 

gtacctcctc gcatcccgc 



<210> 44 

<211> 20 

<212> DNA 

<213> Artificial 

<220> 

<223> tag2 

<400> 44 

gtggtgtgcg tgtcgaaggt 
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19 



20 



20 



20 



19 



20 
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<210> 45 

<211> 20 

<212> DNA 

<213> Artificial 

<220> 

<223> tag3 

<400> 45 

atggaccgag ggccccagcc 



<210> 46 

<211> 19 

<212> DNA 

<213> Artificial 

<220> 

<223> tag4 

<400> 46 

cggatcgggt gggtcggac 



<210> 47 

<211> 19 

<212> DNA 

<213> Artificial 

<220> 

<223> tag5 

<400> 47 

agaggtcgca gcagttcgt 



<210> 48 

<211> 20 

<212> DNA 

<213> Artificial 

<220> 

<223> tag6 

<400> 48 

tctccggagc cggcgctgtg 



<210> 49 

<211> 19 

<212> DNA 

<213> Artificial 

<220> 

<223> tag7 

<400> 49 

agactttgca ggctccgag 



<210> 50 

<211> 20 

<212> DNA 

<213> Artificial 

<220> 

<223> tag8 
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20 



19 



19 



20 
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<400> 50 

ggagctgccg cagcgccgga 



20 



<210> 51 

<211> 20 

<212> DNA 

<213> Artificial 

<220> 

<223> tag9 

<400> 51 

acaccgtcgg acctggtcgc 20 



<210> 52 

<211> 20 

<212> DNA 

<213> Artificial 

<220> 

<223> taglO 



<210> 53 

<211> 20 

<212> DNA 

<213> Artificial 

<220> 

<223> tagl 1 

<400> 53 

gccgttcctt gcttgctgga 20 



<210> 54 

<211> 20 

<212> DNA 

<213> Artificial 

<220> 

<223> tagl2 



<210> 55 

<211> 19 

<212> DNA 

<213> Artificial 

<220> 

<223> tag 13 

<400> 55 

gagtaactat aacggtcct 19 



<210> 56 

<211> 19 

<212> DNA 

<213> Artificial 



<400> 52 

agacgttctc gcccagagtc 



20 



<400> 54 

gggttgggga tttagctcag 



20 



<220> 



12/20 



WO 03/106672 



PCT/JP03/07514 



<223> tag 14 



<400> 56 

gattccgcct ggagctcgc 



<210> 57 

<211> 18 

<212> DNA 

<213> Artificial 

<220> 

<223> tag 15 

<400> 57 

agggaccgct gcggtccg 18 



<210> 58 

<211> 18 

<212> DNA 

<213> Artificial 

<220> 

<223> zzb21106i09t3 junk 



<210> 59 

<211> 28 

<212> DNA 

<213> Artificial 

<220> 

<223> zzb21106i09t3 junk 

<400> 59 

acccgggggg cgggactaac cgtcggac 28 



<210> 60 

<211> 20 

<212> DNA 

<213> Artificial 

<220> 

<223> tagl 

<400> 60 

gacgcggaag gcgcggcggc 20 



<210> 61 

<211> 21 

<212> DNA 

<213> Artificial 

<220> 

<223> tag2 



<400> 58 

cattagggga ttgggccc 



18 



<400> 61 

ggagggcggg cggcggccct c 



21 



<210> 62 
<211> 19 
<212> DNA 



<213> Artificial 
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<220> 

<223> tag3 
<400> 62 

C3,3,Sl3,3,3,3,3,cL 2tcL3,3,3,8lcLCi> 19 



<210> 63 

<211> 20 

<212> DNA 

<213> Artificial 

<220> 

<223> tag4 

<400> 63 

aggctctgct cgctctgccc 20 

<210> 64 

<211> 19 

<212> DNA 

<213> Artificial 

<220> 

<223> tag5 

<400> 64 

acttctgatt ctgacagac - 19 

<210> 65 

<211> 19 

<212> DNA 

<213> Artificial 

<220> 

<223> tag6 

<400> 65 

acagtggcgt ctgcaaagc 19 

<210> 66 

<211> 20 

<212> DNA 

<213> Artificial 

<220> 

<223> tag7 

<400> 66 

ggaaagtcca ggtggacttt 20 

<210> 67 

<211> 19 

<212> DNA 

<213> Artificial 

<220> 

<223> tag8 

<400> 67 

gccgccgagg ccgcgcagg 19 

<210> 68 

<211> 19 
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<212> DNA 

<213> Artificial 

<220> 

<223> tag9 

<400> 68 

gttagtgtat aacagagtt 19 

<210> 69 

<211> 20 

<212> DNA 

<213> Artificial 

<220> 

<223> taglO 

<400> 69 

tcgcccgctg ttcagtctct 20 

<210> 70 

<211> 19 

<212> DNA 

<213> Artificial 

<220> 

<223> tagll 

<400> 70 

aggtggggca agatggctg 19 

<210> 71 

<211> 20 

<212> DNA 

<213> Artificial 

<220> 

<223> tagl2 

<400> 71 

ggcatggcca gaaggcaagc 20 

<210> 72 

<211> 20 

<212> DNA 

<213> Artificial 

<220> 

<223> tagl3 

<400> 72 

gacgcacgca tagagggggg 20 

<210> 73 

<211> 19 

<212> DNA 

<213> Artificial 

<220> 

<223> tagl4 



<220> 

<221> misc_feature 

<222> (1)..(1) 

<223> n is any nucleotide 
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<400> 73 

nccatggaac agccacact 19 

<210> 74 

<211> 26 

<212> DNA 

<213> Artificial 

<220> 

<223> junkl 
<400> 74 

tgataaggca atggcctcta atgctg 26 

<210> 75 

<211> 16 

<212> DNA 

<213> Artificial 

<220> 

<223> tag 
<400> 75 

acctccctcc gcggag 16 

<210> 76 

<211> 2745 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<222> (842).. (2017) 
<223> 

<400> 76 

acctccctcc gcggagcagc cagacagcga gggccccggc cgggggcagg ggggacgccc 60 

cgtccggggc accccccccg gctctgagcc gcccgcgggg ccggcctcgg cccggagcgg 120 

aggaaggagt cgccgaggag cagcctgagg ccccagagtc tgagacgagc cgccgccgcc 180 

cccgccactg cggggaggag ggggaggagg agcgggagga gggacgagct ggtcgggaga 240 

agaggaaaaa aacttttgag acttttccgt tgccgctggg agccggaggc gcggggacct 300 

cttggcgcga cgctgccccg cgaggaggca ggacttgggg accccagacc gcctcccttt 360 

gccgccgggg acgcttgctc cctccctgcc ccctacacgg cgtccctcag gcgcccccat 420 

tccggaccag ccctcgggag tcgccgaccc ggcctcccgc aaagactttt ccccagacct 480 

cgggcgcacc ccctgcacgc cgccttcatc cccggcctgt ctcctgagcc cccgcgcatc 540 

ctagaccctt tctcctccag gagacggatc tctctccgac ctgccacaga tcccctattc 600 

aagaccaccc accttctggt accagatcgc gcccatctag gttatttccg tgggatactg 660 

agacaccccc ggtccaagcc tcccctccac cactgcgccc ttctccctga ggagcctcag 720 

ctttccctcg aggccctcct accttttgcc gggagacccc cagcccctgc aggggcgggg 780 

cctccccacc acaccagccc tgttcgcgct ctcggcagtg ccggggggcg ccgcctcccc 840 

c atg ccg ccc tec ggg ctg egg ctg ctg ccg ctg ctg eta ccg ctg ctg 889 
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Met Pro Pro Ser Gly Leu Arg Leu Leu Pro Leu Leu Leu Pro Leu Leu 
15 10 15 

tgg eta ctg gtg ctg acg cct ggc ccg ccg gec gcg gga eta tec acc 937 
Trp Leu Leu Val Leu Thr Pro Gly Pro Pro Ala Ala Gly Leu Ser Thr 

20 25 30 

tgc aag act ate gac atg gag ctg gtg aag egg aag cgc ate gag gee 985 
Cys Lys Thr He Asp Met Glu Leu Val Lys Arg Lys Arg He Glu Ala 
35 40 45 

ate cgc ggc cag ate ctg tec aag ctg egg etc gee age ccc ccg age 1033 
lie Arg Gly Gin He Leu Ser Lys Leu Arg Leu Ala Ser Pro Pro Ser 
50 55 60 

cag ggg gag gtg ccg ccc ggc ccg ctg ccc gag gee gtg etc gee ctg 1081 
Gin Gly Glu Val Pro Pro Gly Pro Leu Pro Glu Ala Val Leu Ala Leu 
65 70 75 80 

tac aac age acc cgc gac egg gtg gec ggg gag agt gca gaa ccg gag 1129 
Tyr Asn Ser Thr Arg Asp Arg Val Ala Gly Glu Ser Ala Glu Pro Glu 

85 90 95 

ccc gag cct gag gee gac tac tac gec aag gag gtc acc cgc gtg eta 1177 
Pro Glu Pro Glu Ala Asp Tyr Tyr Ala Lys Glu Val Thr Arg Val Leu 

100 105 110 

atg gtg gaa acc cac aac gaa ate tat gac aag ttc aag cag agt aca 1225 
Met Val Glu Thr His Asn Glu He Tyr Asp Lys Phe Lys Gin Ser Thr 
115 120 125 

cac age ata tat atg ttc ttc aac aca tea gag etc cga gaa gcg gta 1273 
His Ser He Tyr Met Phe Phe Asn Thr Ser Glu Leu Arg Glu Ala Val 
130 135 140 

cct gaa ccc gtg ttg etc tec egg gea gag ctg cgt ctg ctg agg agg 1321 
Pro Glu Pro Val Leu Leu Ser Arg Ala Glu Leu Arg Leu Leu Arg Arg 
145 150 155 160 

etc aag tta aaa gtg gag cag cac gtg gag ctg tac cag aaa tac age 1369 
Leu Lys Leu Lys Val Glu Gin His Val Glu Leu Tyr Gin Lys Tyr Ser 

165 170 175 

aac aat tec tgg cga tac etc age aac egg ctg ctg gca ccc age gac 1417 
Asn Asn Ser Trp Arg Tyr Leu Ser Asn Arg Leu Leu Ala Pro Ser Asp 

180 185 190 

teg cca gag tgg tta tct ttt gat gtc acc gga gtt gtg egg cag tgg 1465 
Ser Pro Glu Trp Leu Ser Phe Asp Val Thr Gly Val Val Arg Gin Trp 
195 200 205 

ttg age cgt gga ggg gaa att gag ggc ttt cgc ctt age gee cac tgc 1513 
Leu Ser Arg Gly Gly Glu lie Glu Gly Phe Arg Leu Ser Ala His Cys 
210 215 220 

tec tgt gac age agg gat aac aca ctg caa gtg gac ate aac ggg ttc 1561 
Ser Cys Asp Ser Arg Asp Asn Thr Leu Gin Val Asp He Asn Gly Phe 
225 230 235 240 

act acc ggc cgc cga ggt gac ctg gee acc att cat ggc atg aac egg 1609 
Thr Thr Gly Arg Arg Gly Asp Leu Ala Thr He His Gly Met Asn Arg 

245 250 255 

cct ttc ctg ctt etc atg gee ace ccg ctg gag agg gee cag cat ctg 1657 
Pro Phe Leu Leu Leu Met Ala Thr Pro Leu Glu Arg Ala Gin His Leu 

260 265 270 

caa age tec egg cac cgc cga gee ctg gac acc aac tat tgc ttc age 1705 
Gin Ser Ser Arg His Arg Arg Ala Leu Asp Thr Asn Tyr Cys Phe Ser 
275 280 285 
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tec acg gag aag aac tgc tgc gtg egg cag ctg tac att gac ttc cgc 1753 
Ser Thr Glu Lys Asn Cys Cys Val Arg Gin Leu Tyr lie Asp Phe Arg 
290 295 300 

aag gac etc gge tgg aag tgg ate cac gag ccc aag ggc tac cat gec 1801 
Lys Asp Leu Gly Trp Lys Trp He His Glu Pro Lys Gly Tyr His Ala 
305 310 315 320 

aac ttc tgc etc ggg ccc tgc ccc tac att tgg age ctg gac acg cag 1849 
Asn Phe Cys Leu Gly Pro Cys Pro Tyr He Trp Ser Leu Asp Thr Gin 

325 330 335 

tac age aag gtc ctg gee ctg tac aac cag cat aac ccg ggc gee teg 1897 
Tyr Ser Lys Val Leu Ala Leu Tyr Asn Gin His Asn Pro Gly Ala Ser 

340 345 350 

gcg gcg ccg tgc tgc gtg ccg cag gcg ctg gag ccg ctg ccc ate gtg 1945 
Ala Ala Pro Cys Cys Val Pro Gin Ala Leu Glu Pro Leu Pro He Val 
355 360 365 

tac tac gtg ggc cgc aag ccc aag gtg gag cag ctg tec aac atg ate 1993 
Tyr Tyr Val Gly Arg Lys Pro Lys Val Glu Gin Leu Ser Asn Met He 
370 375 380 

gtg cgc tec tgc aag tgc age tga ggtcccgccc cgccccgccc cgccccggca 2047 
Val Arg Ser Cys Lys Cys Ser 
385 390 

ggcccggccc caccccgccc cgcccccgct gccttgccca tgggggctgt atttaaggac 2107 

accgtgcccc aagcccacct ggggccccat taaagatgga gagaggactg eggatctctg 2167 

tgtcattggg cgcctgcctg gggtctccat ccctgacgtt cccccactcc cactccctct 2227 

ctctccctct ctgcctcctc ctgcctgtct gcactattcc tttgcccggc ateaaggcac 2287 

aggggaccag tggggaacac tactgtagtt agatctattt attgagcacc ttgggcactg 2347 

ttgaagtgcc ttacattaat gaactcattc agtcaccata gcaacactct gagatggcag 2407 

ggactctgat aacacccatt ttaaaggttg aggaaacaag cccagagagg ttaagggagg 2467 

agttcctgcc caccaggaac ctgctttagt gggggatagt gaagaagaca ataaaagata 2527 

gtagttcagg ecaggegggg tgctcacgcc tgtaatccta gcacttttgg gaggcagaga 2587 

tgggaggata cttgaatcca ggcatttgag accagcctgg gtaacatagt gagaccctat 2647 

ctctacaaaa cacttttaaa aaatgtacac ctgtggtccc agctactctg gaggctaagg 2707 

tgggaggatc acttgatcct gggaggtcaa ggctgcag 2745 

<210> 77 

<211> 391 

<212> PRT 

<213> Homo sapiens 

<400> 77 

Met Pro Pro Ser Gly Leu Arg Leu Leu Pro Leu Leu Leu Pro Leu Leu 
15 10 15 

Trp Leu Leu Val Leu Thr Pro Gly Pro Pro Ala Ala Gly Leu Ser Thr 

20 25 30 

Cys Lys Thr He Asp Met Glu Leu Val Lys Arg Lys Arg lie Glu Ala 
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35 40 45 

He Arg Gly Gin lie Leu Ser Lys Leu Arg Leu Ala Ser Pro Pro Ser 
50 55 60 

Gin Gly Glu Val Pro Pro Gly Pro Leu Pro Glu Ala Val Leu Ala Leu 
65 70 75 80 

Tyr Asn Ser Thr Arg Asp Arg Val Ala Gly Glu Ser Ala Glu Pro Glu 

85 90 95 

Pro Glu Pro Glu Ala Asp Tyr Tyr Ala Lys Glu Val Thr Arg Val Leu 

100 105 110 

Met Val Glu Thr His Asn Glu lie Tyr Asp Lys Phe Lys Gin Ser Thr 
115 120 125 

His Ser He Tyr Met Phe Phe Asn Thr Ser Glu Leu Arg Glu Ala Val 
130 135 140 

Pro Glu Pro Val Leu Leu Ser Arg Ala Glu Leu Arg Leu Leu Arg Arg 
145 150 155 160 

Leu Lys Leu Lys Val Glu Gin His Val Glu Leu Tyr Gin Lys Tyr Ser 

165 170 175 

Asn Asn Ser Trp Arg Tyr Leu Ser Asn Arg Leu Leu Ala Pro Ser Asp 

180 185 190 

Ser Pro Glu Trp Leu Ser Phe Asp Val Thr Gly Val Val Arg Gin Trp 
195 200 205 

Leu Ser Arg Gly Gly Glu He Glu Gly Phe Arg Leu Ser Ala His Cys 
210 215 220 

* 

Ser Cys Asp Ser Arg Asp Asn Thr Leu Gin Val Asp lie Asn Gly Phe 
225 230 235 240 

Thr Thr Gly Arg Arg Gly Asp Leu Ala Thr He His Gly Met Asn Arg 

245 250 255 

Pro Phe Leu Leu Leu Met Ala Thr Pro Leu Glu Arg Ala Gin His Leu 

260 265 270 

Gin Ser Ser Arg His Arg Arg Ala Leu Asp Thr Asn Tyr Cys Phe Ser 
275 280 285 

Ser Thr Glu Lys Asn Cys Cys Val Arg Gin Leu Tyr He Asp Phe Arg 
290 295 300 

Lys Asp Leu Gly Trp Lys Trp He His Glu Pro Lys Gly Tyr His Ala 
305 310 315 320 
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Asn Phe Cys Leu Gly Pro Cys Pro Tyr lie Trp Ser Leu Asp Thr Gin 

325 330 335 



Tyr Ser Lys Val Leu Ala Leu Tyr Asn Gin His Asn Pro Gly Ala Ser 

340 345 350 



Ala Ala Pro Cys Cys Val Pro Gin Ala Leu Glu Pro Leu Pro lie Val 
355 360 365 



Tyr Tyr Val Gly Arg Lys Pro Lys Val Glu Gin Leu Ser Asn Met He 
370 375 380 



Val Arg Ser Cys Lys Cys Ser 
385 390 
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